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ABSTRACT 


Automatic  theorem  proving,  that  is,  theorem  proving  by 
computer,  has  a  wide  variety  of  applications.  Chapter  I 
takes  a  light-hearted  look  at  some  of  these  ways  in  order  to 
motivate  the  search  for  a  good  theorem  proving  system. 
Chapter  II  is  a  theoretical  discussion  of  various  issues  in 
logic  such  as  the  distinction  between  "semantic"  and 
"natural  deduction"  systems  of  logic,  and  the  importance  of 
such  notions  as  completeness  and  the  deduction  theorem.  This 
chapter  also  describes  the  resolution  rule  of  inference  and 
a  natural  deduction  system  due  to  Kalish  &  Montague  (1964). 
Chapter  III  is  a  brief  survey  of  the  history  of  automatic 
theorem  proving  from  the  Logic  Theorist  of  1957  to  recent 
connection  graph  resolution-based  systems  and  certain 
current  natural  deduction  systems.  Chapter  IV  is  a  catalog 
of  problems  that  attend  these  systems.  Chapters  V  and  VI 
describe  a  theorem  proving  program  based  on  the  Kalish  & 
Montague  system.  This  system  attacks  the  proof  of  a  theorem 
by  employing  heuristics  or  strategies  which  are  based  on  the 
type  of  formula  to  be  proved  and  on  previous  lines  of  the 
proof,  and  unlike  resolution  systems,  does  not  require  any 
pre-processing  of  formulae  to  put  them  into  clause  form. 
Chapter  VII  describes  the  theorems  which  can  be  proved  and 
compares  them  to  the  output  of  previous  theorem  provers. 
There  are  some  theorems  which  cannot  be  stated  in  the 
language  treated  by  the  present  theorem  prover  which  are 
statable  in  other  theorem  proving  systems.  But  there  are  no 
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theorems  which  are  statable  in  both  systems  that  are 
provable  by  any  of  the  other  systems  which  the  present 
system  cannot  also  prove;  however,  the  reverse  is  not  true 
for  any  of  the  published  theorem  provers.  The  final  chapter 
describes  some  future  extensions  to  the  system,  which  work 
is  currently  underway.  Two  Appendices  give:  (a)  printouts  of 
proofs  of  a  selection  of  theorems,  with  a  commentary  on  how 
this  compares  to  other  systems,  and  (b)  a  pidgin  ALGOL 
statement  of  the  heuristics  employed. 
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I .  WHY  BOTHER? 


Polly  Programmer  has  just  written  a  5000  line  program 
in  ANSI  77  FORTRAN.  No  matter  how  talented  Polly  is,  she 
will  not  be  able  to  foresee  all  the  ins  and  outs  of  her 
program.  One  way  to  try  to  find  out  --  the  way  she  was 
taught  at  university  —  would  be  to  run  her  program  on  a 
wide  variety  of  test  data  to  guarantee  that  it  performs  as 
desired  in  these  cases.  But  the  years  since  university  have 
taught  Polly  that  she  is  remarkably  bad  at  guessing  what 
sorts  of  input  those  silly  users  will  want  to  enter  and  how 
her  program  will  behave.  What  Polly  really  wants  is  a 
program  verifier:  a  program  which  will  mechanically  prove 
that  her  program  satisfies  some  specification,  or  produces 
the  same  output  as  another  program,  or  can  be  executed 
within  certain  time  and  space  bounds.  To  do  this,  there  must 
be  a  formal  program  semantics  given  for  ANSI  77  FORTRAN,  a 
formal  logical  theory,  and  an  automatic  theorem  prover  for 
that  theory.  Such  a  verifier  reduces  the  question  of  whether 
the  program  has  such-and-such  a  property  to  the  question  of 
whether  so-and-so  formulae  are  theorems  of  the  system. 

Jim  Faculty,  Jr.  wants  tenure  in  the  mathematics 
department  at  the  University.  Jim  wishes  to  be  the  "idea 
man"  for  an  interactive  mathematics  prover.  He  would  like  to 
be  able  to  type  in  PROVE ( FERMAT )  and  have  his  automatic 
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theorem  prover  work  on  the  problem  for  a  while.  If  it  should 
get  "stuck",  he  would  like  to  be  able  to  suggest  guidance  in 
the  form  of  new  premises  to  be  added  or  lemmas  to  be  proved 
first.  He  reasons  that,  far  from  subverting  the  creative 
aspect  of  mathematics,  much  less  the  tenure  system,  this 
man-machine  interaction  will  free  the  ordinary  working 
mathematician  from  all  the  low-level  drudgery  of  getting 
unimportant  details  correct,  and  allow  him  to  concentrate  on 
the  truly  creative  aspects  of  his  field.  In  this  he  is 
following  the  famous  solution  to  the  four-colour  problem. 

MAO,  the  Mechanical  Admissions  Officer  at  the 
University,  is  reading  over  letters  of  recommendation  from 
undergraduate  teachers  concerning  prospective  graduate 
students.  One  such  letter  reads 

Dear  Sir,  Candidate  X’s  attendance  at  my  classes  has 
been  excellent  and  he  has  been  known  to  produce 
artifacts  which  bear  a  certain  semblance  to  good 
computer  programs.  Sincerely,  Prof.  Y 
MAO’s  conversational  impl icature  subprogram  reasons  as 
follows:  "Nothing  in  the  literal  meaning  of  Prof.  Y’s  letter 
is  relevant  to  X's  getting  into  graduate  school.  However,  if 
Prof.  Y  really  had  not  wanted  to  say  anything  he  would  not 
have  written.  And  he  must  be  able  to  say  something  relevant 
since  X  was  his  pupil.  Furthermore,  he  knows  that  more 
information  is  wanted,  since  he  has  done  this  before.  He 
must  therefore  be  wishing  to  impart  information  that  he  is 
reluctant  to  write  down.  There  is  a  briefer  and  clearer,  but 
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nearly  synonymous,  way  to  express  "produce  artifacts  which 
....",  namely  "programs  well."  The  most  obvious  supposition 
is  that  Y  is  drawing  attention  to  some  striking  difference 
between  X's  performance  and  that  to  which  "programs  well" 
usually  applies.  It  is  most  likely  that  X's  performance  has 
some  hideous  defect.  I  therefore  conclude  that,  in  Y's 
opinion,  X  is  not  a  good  candidate  for  graduate  study." 

Charlie  Cheater,  a  student  at  the  University,  has  even 
more  grand  desires  than  Polly  Programmer.  He  wants  to  be 
able  to  specify  an  algorithm  and  have  an  automatic  program 
generator  produce  source  code  in  PASCAL.  Charlie  does  not 
view  this  as  cheating.  "After  all,"  he  asks,  "isn't  this 
just  what  compilers  do?  Weren't  the  original  FORTRAN  and 
COBOL  compilers  called  'automatic  program  generators'? 

What's  the  difference  between  generating  machine  code  from  a 
high  level  language  and  generating  high  level  code  from  a 
super-high  level  language?"  With  this  analogy  in  mind, 
Charlie  has  written  a  "compiler"  which  takes  the  language  of 
his  algorithm  and  (perhaps  after  repeated  passes)  converts 
his  sentences  into  constructs  of  PASCAL.  In  his  "compiler", 
the  expressions  which  occur  during  this  transformation 
process  not  only  specify  the  target  program,  but  also 
specify  conditions  to  be  proved  as  well  as  conditions  to  be 
made  true.  For  an  expression  which  specifies  a  ( sub ) program , 
the  goal  is  to  convert  that  (sub)program  into  PASCAL;  for  an 
expression  which  is  a  condition  to  be  proved,  the  goal  is  to 
convert  it  into  the  logical  constant  true;  and  for  an 
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expression  which  is  a  condition  to  be  made  true,  the  goal  is 
to  construct  a  program  that  will  make  the  condition  true.  In 
achieving  these  goals,  including  the  goal  of  writing  PASCAL, 
Charlie's  "compiler"  creates  a  tree  of  goals  and  subgoals. 

It  then  proceeds  to  establish  these  goals  by  invoking  a 
theorem  prover  which  guarantees  that  each  goal  has  been 
established . 

Charlie's  friend  Larry  Lazy  wants  to  be  able  to  specify 
only  input-output  relations  and  have  the  automatic 
programmer  produce  a  PASCAL  program  that  will  also  exhibit 
those  input-output  relations.  As  clever  as  Charlie  was  to 
write  his  "compiler",  Larry  is  even  cleverer.  He  notices 
that  what  he  needs  to  do  is  prove  this  as  a  theorem: 


(Ax)  [Px  ->  (Ey)Qxy  ] 

where  x  ranges  over  input  variables  and  P  i 
that  the  input  is  expected  to  satisfy,  y  ra 
variables  and  Q  specifies  the  relation  betw 
variable  and  its  associated  output  variable 
execution  of  the  desired  program.  Now,  give 
other  axioms  so  that  (1)  is  provable,  the  c 
the  proof  of  (1)  will  allow  the  extraction 
example,  certain  constructs  in  the  proof  wi 
conditional  statements,  others  will  produce 
statements,  and  uses  of  induction  axioms  wi 
recursive  loops. 


(1) 

s  the  predicate 
nges  over  output 
een  each  input 
after  the 
n  sufficient 
onstruction  of 
of  a  program.  For 
11  produce 
sequential 
11  produce 


Felicity  Findout  is  writing  a  large  data  base  retrieval 
question  answering  system .  In  her  earlier  investigations  she 
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discovered  that  the  typical  questions  to  be  answered  fell 
into  three  categories:  (a)  those  questions  requiring  yes/no 
answers,  (b)  those  questions  requiring  the  name  of  some 
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entity  (state,...)  such  as  WH-quest ions ,  and  (c)  those 
questions  requiring  the  specification  of  a  sequence  of 
actions  such  as  HOW-questions.  Early  on  Felicity  decided 
that  the  natural  way  to  deal  with  type  (a)  questions  was  to 
represent  the  data  base  as  a  set  of  predicate  logic 
statements,  the  question  as  a  declarative  statement,  and 
invoke  a  theorem  prover  to  try  first  to  deduce  the 
question-statement  from  the  data  base.  If  successful,  answer 
"yes",  if  not  then  try  to  deduce  its  negation  from  the  data 
base.  If  successful,  answer  "no";  otherwise  the  answer  is 
"don't  know".  Felicity  uses  a  variable  tracing  method  to 
handle  type  (b)  questions.  Given,  for  instance,  that  Dr. 
Smarts  is  the  Chairman  of  the  Computer  Science  Department, 
this  is  represented  in  the  data  base  as 

C ( s , csd )  (  2  ) 

And  to  answer  the  question  "Who  is  Chairman  of  the  Computer 
Science  Department?",  Felicity's  program  tries  to  prove  that 
( Ex ) C ( x , csd )  ( 3 ) 

follows  from  the  information  in  the  data  base.  It  does  of 
course:  assume  that  (3)  is  false  and  a  contradiction  will 
ensue  from  (2)  by  instantiating  the  (universally  quantified) 
assumption  that  (3)  is  false  to  S.  All  that  is  needed  then 
is  a  way  of  finding  out  that  it  was  the  instance  s  which 
resulted  in  the  contradiction  --  some  "variable  tracing" 
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mechanism.  Felicity  has  yet  to  fully  implement  a  solution  to 
questions  of  type  (c) ,  but  her  solution  to  the 
"monkey-banana"  problem  (which  builds  upon  her  type  (b) 
"variable  tracing"  mechanism)  gives  her  cause  for  optimism. 

A  monkey  wants  to  eat  a  banana  that  is  suspended 
from  the  ceiling  of  a  room.  The  monkey  is  too  short 
to  reach  the  banana;  however,  it  can  walk  around  the 
room  carrying  a  chair  that  is  in  the  room,  and  it 
can  climb  the  chair  to  reach  the  banana. 

The  relevant  predicates  and  functions  are: 

P(x,y,z,w):  in  state  w,  the  monkey  is  at  x,  the  banana 
at  y,  and  the  chair  at  z 

R(x):  in  state  x,  the  monkey  can  reach  the  banana 
f(x,y,z):  the  state  attained  if  the  monkey  is  initially 
in  state  z  and  walks  from  x  to  y 
g(x,y,z):  the  state  attained  if  the  monkey  is  initially 
in  state  z  and  walks  from  x  to  y  carrying  the  chair 
h(x):  the  state  attained  if  the  monkey  is  initially  in 
state  x  and  climbs  the  chair 
Assume  the  initial  position  of  the  monkey  is  at  a,  the 
banana  at  £>,  the  chair  at  C,  and  the  monkey  is  in  state  s. 
Then  the  data  base  contains 

(Ax)  (Ay  )  (Az)  (Aw)  [P(x,y  ,z  ,w)  -*  P  (  z  ,  y  ,  z  ,  f  ( x  ,  z  ,  w )  )  ] 

(Ax)  (Ay)  (Az)  [P(x,y,x,z)  -*  P  ( y  ,  y  ,  y  ,  g  ( x  ,  y ,  z  )  )  ] 

( Ax  )  [  P  ( b ,  b ,  b ,  x  )  ->  R(h(x))] 

P(a,b,c ,s) 

That  is,  respectively:  in  any  state  the  monkey  can  walk  from 
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x  to  z;  in  a  state  where  the  monkey  is  beside  the  chair 
located  at  x,  it  can  carry  the  chair  to  any  place  y;  if  the 
chair  and  the  monkey  are  both  under  the  banana,  then  it  can 
climb  the  chair  and  get  the  banana;  and  finally  a  statement 
of  initial  conditions.  Felicity's  program  is  to  prove 
(Ex)Rx,  and  "trace"  the  appropriate  instance.  Her  program 
proves  it  and  yields 
h(g(c,b,f(a,c,s))) 

as  the  relevant  instance,  i.e.,  the  state  resulting  from 
climbing  the  chair  after  having  walked  from  c  to  b  carrying 
the  chair  after  getting  into  the  state  resulting  from 
walking  from  a  to  c  out  of  the  initial  state  s. 

Robbie  Robot  is  ordered  to  move  a  newly  delivered 
computer  terminal  t  from  the  loading  dock  7  to  a  point  in 
the  terminal  room  r,  namely  the  back  corner,  bin).  In 
Robbie's  knowledge  base  is  a  representation  of  the  physical 
relationships  between  7  and  r,  to  wit,  that  they  are  each 
five  meters  square,  that  they  are  located  some  50  meters 
apart,  that  they  are  joined  by  a  corridor  c,  that  the  point 
where  c  and  7  join  contains  a  closed  door  d i ,  that  the  point 

where  C  and  r  join  contains  a  closed  door  d2,  that  any  route 

between  7  and  Mr)  except  the  corridor  contains 

"obstructions"  (viz.,  walls)  and  it  has  no  action  with  which 
it  can  change  this  obstructiveness  condition,  and  finally 
that  dy  and  d2  are  the  only  "obstructions"  along  the 
corridor  route  between  7  and  Mr)  .  Robbie  also  has  in  its 
knowledge  base  a  repertoire  of  primitive  actions  which 
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include  moving  one  meter  in  any  direction  so  long  as  there 
is  no  obstruction  in  that  new  location,  and  opening  doors  if 
they  are  located  within  one  meter  and  it  is  not  holding 
anything.  For  ease  of  computation,  Robbie  is  also  endowed 
with  a  higher  order  action  of  the  form  "if  there  is  no 
obstruction  between  Robbie's  present  location  and  location 
x,  then  Robbie  can  move  to  x" .  Robbie  also  has  in  its 
possession  a  variety  of  "general  knowledge":  That  if  two 
physical  objects  are  located  further  apart  than  their  size, 
then  they  are  not  identical;  that  an  open  door  is  no 
obstruction;  that  if  Robbie  is  carrying  x  and  Robbie  moves 
from  y  to  z,  then  x  moves  from  y  to  z;  that  if  x  is  in  y  and 
ytz  then  x  is  not  in  z;  and  if  there  is  no  obstruction 
between  x  and  y  and  there  is  no  obstruction  between  y  and  z, 
then  there  is  no  obstruction  between  x  and  z. 

How  is  Robbie  to  get  t  from  7  to  Mr))?  Well,  what 
Robbie  needs  to  do  is  become  convinced  that  t  is  in  Mr) . 
Being  efficient,  Robbie  will  first  try  to  become  convinced 
that  t  is  already  at  b(r) .  So  Robbie  says  to  itself:  7  and  r 
are  five  meters  long  and  located  50  meters  apart,  50>5, 
hence  7 ±r.  bin)  is  in  r,  hence  7*Mr).  But  t  is  in  7  hence  t 
is  not  in  b(r) .  Having  convinced  itself  that  there's  work  to 
be  done,  Robbie  goes  through  its  repertoire  of  actions  and 
discovers  that  if  Robbie  is  at  b(r)  and  carrying  t  then  t  is 
in  bin).  Robbie  fails  at  its  proof  that  it  is  at  b(r)  and 
carrying  t,  and  so  again  goes  through  its  repertoire  of 
actions  to  discover  that  if  there  were  no  obstructions 
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between  7  and  bir)  it  could  get  to  b{r)  carrying  t.  So 
Robbie  next  tries  to  convince  itself  that  there  are  no 
obstructions.  This  fails  of  course,  but  in  discovering  this 
failure,  Robbie  also  discovers  that  if  it  were  in  the 
corridor  within  one  meter  of  the  door  and  not  holding 
anything,  it  could  open  d2  and  there  would  be  no  obstruction 
between  c  and  bir).  A  similar  consideration  makes  Robbie 
discover  that  if  it  were  in  7  and  within  one  meter  of  the 
door  and  not  holding  anything,  it  could  open  dy  and  there 
would  be  no  obstruction  between  7  and  c.  From  these  Robbie 
concludes  that  these  two  actions  would  make  there  be  no 
obstruction  between  7  and  b(r) .  Given  that,  Robbie  finally 
concludes  that  in  such  a  situation  it  can  pick  up  t  (if  it 
is  within  one  meter)  and  carry  it  to  b(r) ,  and  then  t  will 
be  in  b(r) .  Thereby  Robbie  makes  true  the  statement:  t  is  at 
bir) .  Robbie  then  carries  out  the  following  sequence  of 
actions:  (a)  Go  from  start  position  to  d i ,  (b)  Open  d i ,  (c) 

Go  from  d ^  to  d2,  (d)  Open  d2r  (e)  Go  from  d2  to  start 
position,  (f)  pick  up  t,  (g)  go  from  start  position  to  bir). 
(With  a  somewhat  different  control  structure  Robbie  might 
perform  this  perhaps  less  efficient  sequence:  (a’)  Pick  up 
t,  (b' )  Go  from  start  position  to  d i,  (c?)  set  t  down,  ( d ’ ) 
Open  d^,  (e')  Pick  up  t,  ( f T )  Go  from  dy  to  d2,  (g’)  set  t 

down,  ( h ’ )  Open  d2  ,  (i’)  Pick  up  t,  (j’)  Go  from  d2  to 

bir).) 

Sammy  Psycho  is  investigating  human  problem  solving 
techniques  and  wants  to  simulate  the  mechanisms  by  which 
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people  go  about  their  everyday  business,  in  order  to  better 
understand  these  mechanisms.  Sammy  wants  to  find  out  what  it 
is  about  the  human  mind  that  sets  it  apart  from  other 
animals.  Throughout  history  it  has  been  taken  to  be  the 
hallmark  of  being  human  that  humans  are  rational  and  think 
—  and  this  was  always  expanded  into:  is  able  to  employ 
syllogistic  reasoning,  is  able  to  solve  mathematical 
problems,  and  (sometimes)  is  able  to  use  language.  In  the 
last  20  years  it  has  been  common  to  mark  off  human 
performance  from  computer  performance  in  terms  of 
"heuristics"  vs.  "algorithms".  A  computer,  so  it  is  claimed, 
blindly  follows  a  (human-devised)  algorithm  and  therefore 
cannot  be  counted  rational,  no  matter  what  its  performance. 
Humans,  on  the  other  hand,  have  a  variety  of  heuristics  or 
strategies  at  their  disposal  and  engage  in  goal-directed 
thought  by  employing  these  strategies  so  long  as  they  appear 
to  be  leading  toward  the  goal,  and  switching  to  another 
strategy  when  the  previous  one  appears  not  to  be  succeeding. 
Sammy  is  not  convinced  that  there  is  really  a  distinction 
between  "blind  algorithms"  and  "heuristic  strategies",  he 
thinks  that  they  merge  into  one  another  and  that  the  real 
difference  between  them  is  a  subjective  impression  of 
unsureness  of  success  in  the  case  of  heuristics.  But  for  now 
he  is  going  to  accept  the  received  distinction  and  try  to 
display  a  theorem  proving  program  which  uses  exclusively 
what  everyone  would  recognize  as  heuristic  strategies.  This 
having  been  done,  Sammy  thinks  he  will  be  in  a  position  to 
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simulate  the  phenomenon  of  truly  human  rationality. 

Sources:  Polly  Programmer's  desires  were  fanned  recently  by 
a  close  reading  of  Manna  &  Waldinger  (1977)  and  Boyer  & 

Moore  (1979),  as  were  Charlie  Cheater's.  Jim  Faculty,  Jr. 
has  been  reading  Bledsoe  &  Bruell  (1974),  Guard  et  al 
(1969),  and  the  discussion  of  the  proof  of  the  four-colour 
problem  by  Appel  &  Hakin  (1977).  MAO's  example  letter  is 
close  to  some  discussed  by  Grice  (1975);  some  steps  toward 
implementation  of  this  sort  of  example  are  taken  by  Lehnert 
(1977)  and  Reiter  (1978).  Larry  Lazy  spent  some  considerable 
time  reading  Biermann  (1976),  Elschlager  &  Phillips  (1979), 
Goad  (1980),  Bibel  (1979),  and  Guiho  &  Greese  (1980). 
Felicity  Findout's  program  has  been  guided  in  its 
development  by,  first,  the  work  of  Green  (1969)  and  more 
particularly  the  examples  and  programs  found  in  Chapter  11 
of  Chang  &  Lee  (1973).  Robbie  Robot's  program  is  a 
development  of  STRIPS  (itself  a  development  of  GPS,  see 
Newell  &  Simon  (1963)  and  Ernst  &  Newell  (1969)),  for  which 
see  Fikes  &  Nilsson  (1971).  The  line  of  development  goes 
through  PLANNER-like  languages  (see  Bobrow  &  Raphael  (1974) 
and  Derksen,  Rulifson  &  Waldinger  (1972)),  and  more 
precisely  to  ABSTRI PS-style  systems  for  hierarchical 
planning  (Sacerdoti  1974).  Sammy  Psycho  is  obviously  a 
student  of  Newell  &  Simon  (1972).  All  of  them  have  read 


Nilsson  (1980). 
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II.  LOGIC:  SOME  THEORETICAL  REMARKS 


A.  Introduction:  Formulae,  Definitions,  and  Abbreviations 

A  logic  is  a  system  of  symbols  organized  in  some 
specified  way.  Of  course,  there  are  a  variety  of  ways  to 
organize  these  symbols;  I  will  use  the  following  definition 
of  formula  as  common  to  all  the  systems  considered.  The 
differences  amongst  the  various  systems  will  therefore  come 
out  in  their  respective  definitions  of  proof .  Still  later  we 
shall  consider  various  augmentations  of  the  notion  of 
formula  and  concomitant  extensions  of  the  definition  of 
proof . 

A  variable  is  one  of  x.  ,  y<  ,  z  ,  w.  ,  where  /  is  a 

C  L  L 

positive  integer 

A  constant  symbol  is  one  of  a^  ,  b*  ,  C'c  ,  d^  ,  where  /  is  a 
positive  integer 

i  t  i 

A  function  symbol  is  one  of  fJ  ,  g'  ,  h^  ,  where  /  and  j 

t 

are  positive  integers  ( f 'l  is  called  a  j-place  function 
symbol ) 

A  term  is  either  a  variable  or  a  constant  or  a  j-place 
function  symbol  followed  by  an  opening  parenthesis, 
followed  by  j  terms  separated  by  commas,  and  followed  by 
a  closing  parenthesis1 

,  i  i  i 

A  predicate  is  one  of  P;  ,  ,  R^  ,  ,  where  j  and  /  are 

i 

non-negative  integers.  is  called  a  j-place 

1  Constants  can  be  assimilated  to  function  symbols  by 
allowing  j  (in  the  function  symbol  definition)  to  be 
non-negative,  and  then  constants  are  0-place  function 
symbols . 
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predicate ) . 2 

A  quant  if ier  is  one  of  A  and  E. 

For mu  1  ae : 

(1)  If  II  is  a  j-place  predicate,  then  II  (a oj)  is  a 
formula . 

(2)  If  0  and  di  are  formulae,  so  are  (0+1I1),  (0&i|i),  (0  +  1I1), 
and  ( 0*~>i|] ) 

(3)  If  0  is  a  formula,  a  is  a  variable,  and  Q  is  a 
quantifier,  then  ->0  and  (Qa)0  are  formulae. 

Some  other  ancillary  notions  and  abbreviatory  devices  which 
shall  be  used  below  are:  propositional  formula,  atomic 
formula,  the  main  connective  of  a  formula,  conditional, 
conjunction,  disjunction,  biconditional,  negation, 
quantified  formula,  subformula,  free/bound  occurrence  of  a 
variable,  literal,  conjunctive  normal  form,  disjunctive 
normal  form,  prenex  normal  form,  skolem  normal  form,  clause 
form,  and  clause. 

A  propositional  formula  is  a  formula  wherein  (a)  there 
are  no  quantifiers  and  (b)  every  predicate  occurring  in  it 
is  0-place.  An  atomic  formula  is  of  the  form:  a  j-place 
predicate  followed  by  j  terms.  As  an  abbreviatory  device,  I 
shall  use  lower  case  letters  p,  q,  r,  s  without  sub-  and 
superscripts  to  describe  atomic  propositional  formulae.  If  a 
formula  is  not  atomic  then  it  is  complex  and  hence  is 
described  by  one  of  the  rules  (2)  or  (3)  given  above  for 
formulae.  If  it  is  by  rule  (2a),  then  the  main  connect ive  is 


2A  proposition  is  a  0-place  predicate. 


. 
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-*•  and  it  is  a  conditional ;  if  it  is  by  rule  (2b),  then  the 
main  connective  is  &  and  it  is  a  conjunction ;  if  it  is  by 
rule  (2c),  then  the  main  connective  is  +  and  it  is  a 
disjunction;  if  it  is  by  rule  (2d),  then  the  main  connective 
is  and  it  is  a  biconditional;  if  it  is  by  rule  (3a),  then 
the  main  connect  ive  is  -■  and  it  is  a  negation;  if  it  is  by 
rule  (3b),  then  the  main  connective  is  the  quantifier  and  it 
is  a  quantified  formula .  These  latter  are  broken  down  into 
existential  formulae  where  the  quantifier  symbol  is  E  and 
universal  formulae  where  the  quantifier  symbol  is  A.  If  e  is 
a  quantified  formula,  then  it  starts  with  a  quant  if ier 
phrase  which  is  an  open  parenthesis  followed  by  a  quantifier 
symbol  followed  by  a  variable  followed  by  a  closing 
parenthesis.  The  variable  in  the  quantifier  phrase  is  called 
the  variable  of  quant  if icat ion .  The  scope  of  the  quantifier 
phrase  is  the  (unique)  formula  following  the  quantifier 
phrase.  A  subformul a  of  0  is  a  formula  which  is  a  physical 
part  of  0.  (Note  that  0  is  a  subformula  of  itself).  An 
occurrence  of  (variable)  a  in  formula  0  is  bound  if  and  only 
if  either  (a)  it  is  in  a  quantifier  phrase  or  (b)  it  occurs 
in  a  subformula  of  0  which  is  a  quantified  formula  and  the 
variable  of  quantification  is  a.  It  is  bound  by  that 
quantifier  phrase.  An  occurrence  of  a  variable  is  free  if 
and  only  if  it  is  not  bound.  As  another  abbreviatory  device, 

I  shall  eliminate  subscripts  on  variables,  predicates  and 
constants  except  when  they  are  needed  for  distinguishing  one 
from  another.  Function  symbols  and  predicates  will  have 


•  » < 
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their  superscripts  deleted  when  it  is  clear  how  many 
arguments  they  take.  A  literal  is  either  an  atomic  formula 
or  the  negation  of  an  atomic  formula. 

Disjunctive  and  conjunctive  formulae  are  taken  to  obey 
the  following  associative  laws: 

(  (  0  ,  &0  2  )  &0  3  )  =  (01&(02&03)) 

((0,+02)+03)  =  (0,+(02+03)) 

As  an  abbreviatory  device  then,  one  can  omit  the  internal 
parentheses  and  widen  the  definition  of  conjunctions  and 
disjunctions  to  include 
(  0  ,  &0  2  &  0  3  ) 

(  0  ,  +0  2  +0  3  ) 

and  further  to  include  as  many  conjuncts  or  d i sjuncts  as  we 
wish.  A  formula  is  in  conjunctive  normal  form  ( disjunctive 
normal  form )  if  and  only  if  it  is  a  conjunction 
(disjunction)  where  all  the  conjuncts  (disjuncts)  are 
disjuncts  (conjuncts)  of  literals.  (Note  that  as  a  special 
case  a  single  literal  is  both  in  conjunctive  normal  form  and 
disjunctive  normal  form).  It  is  a  well  known  result  that 
every  quant i f ier-f ree  formula  can  be  put  into  an  equivalent 
conjunctive  normal  form  and  an  equivalent  disjunctive  normal 
form.3  By  "equivalent"  here  is  meant  that  the  following 
formula  will  be  a  theorem  of  such  a  standard  logic  (see 
below  for  the  notion  of  a  theorem). 

0«"*0  ' 

where  0'  is  the  conjunctive  normal  form  or  disjunctive 

3Under  standard  interpretations  of  the  logic,  which  is  all 
that  we  shall  be  concerned  with  here. 
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normal  form  of  0.  A  formula  is  in  prenex  normal  form  if  it 
is  of  the  form 

(Q , a i  ) (Q  2a  2) . . . (Q^a^)0 

where  the  Q' s  are  quantifiers,  the  a's  are  all  distinct,  and 
0  contains  no  quantifiers.  That  is,  all  quantifiers  are  "in 
front  of  the  matrix."  It  is  again  well  known  that  every 
formula  has  an  equivalent  prenex  normal  form.  Given  a 
formula  in  prenex  normal  form,  one  can  arrive  at  a  sKolem 
normal  form  by  following  the  following  procedure.  Start  with 
the  innermost  quantifier  phrase,  working  outwards  one  by  one 
to  the  outermost  quantifier  phrase,  and  do  the  following: 

(a)  if  the  quantifier  phrase  is  universal,  delete  it  and 
move  to  the  next  outermost  quantifier  phrase,  (b)  if  the 
quantifier  phrase  is  existential  and  there  are  no  universal 
quantifier  phrases  remaining  to  process,  replace  each 
occurrence  of  the  variable  bound  by  that  quantifier  phrase 
by  a  constant  that  has  never  appeared  in  the  formula  up  to 
this  stage,  delete  the  quantifier  phrase,  and  move  on  to  the 
next  outermost  quantifier  phrase,  (c)  if  the  quantifier 
phrase  is  existential  and  there  are  n  more  universal 
quantifier  phrases  left  to  process,  replace  each  occurrence 
of  the  variable  bound  by  that  existential  quantifier  phrase 
by  a  new  n-place  function  symbol  and  make  the  n  arguments  of 
that  function  symbol  be  the  n  variables  mentioned  in  the 
unprocessed  universal  quantifier  phrases,  delete  the 
existential  quantifier  phrase,  and  move  on  to  the  next 
outermost  quantifier  phrase.  It  can  be  shown  that  the 
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resulting  formula  is  a  theorem  if  and  only  if  the  starting 
prenex  normal  form  formula  was  a  theorem.  (But  note  that 
they  are  not  equivalent  in  the  sense  defined  above).  Thus 
every  formula  0  has  a  skolem  normal  form  which  is  a  theorem 
if  and  only  if  0  is  a  theorem. 

If  the  matrix  of  the  prenex  normal  form  was  in 
conjunctive  normal  form,  then  the  resulting  skolem  normal 
form  will  be  also.  (Equivalently,  the  resulting  skolem 
normal  form  has  an  equivalent  conjunctive  normal  form).  Such 
a  formula  is  said  to  be  in  clause  form ,  and  each  of  the 
conjuncts  is  a  clause.  Thus  each  clause  is  a  disjunction  of 
literals. 


B.  A  Classification  of  Systems  of  Logic 

In  the  abstract,  it  is  traditional  to  divide  systems  of 
logic  into  three  sorts:  axiomatic,  "semantic",  and  natural 
deduction.  The  reason  for  this  division  seems  to  be  based  on 
a  variety  of  considerations,  which  do  not  always  cohere  in 
such  a  way  as  to  give  univocal  results.  In  the  next  section 
I  will  briefly  consider  the  reasons  one  might  offer  in 
support  of  this  classification,  but  first  we  should  look  at 
some  examples  so  as  to  have  some  background  exemplars  to 
discuss . 

An  axiomatic  system  of  logic  takes  certain  formulae  as 
"given"  (in  the  sense  of  requiring  no  other  justification), 
gives  a  set  of  "rules  of  inference"  (methods  of  transforming 
one  (or  more)  formulae  into  another),  and  defines  a  proof  as 
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an  ordered  (finite)  set  of  formulae,  each  one  of  which  is  an 
axiom  or  follows  from  previous  (in  the  ordering)  formulae  by 
a  rule  of  inference.  To  give  an  example  of  a  proof  in  a 
typical  propositional  axiomatic  system,  consider  the  system 
PI  of  Church  (1956).  Included  in  the  axioms  are 
A 1  :  (p->  (q-*p)  ) 

A2:  (  (s^(p-*q)  )-+(  ( s-*p)-»>(  s-*q)  )  ) 
and  the  two  rules  of  inference, 

MP:  from  (A-*B)  and  A,  infer  B 

Sub:  from  A,  if  b  is  a  propositional  variable  in  A, 
infer  the  result  of  replacing  all  occurrences  of  b  in  A 
by  a  formula  B 

A  proof  of  the  theorem  (p+p)  in  this  system  would  be 

1.  (  (s-»(p-*q)  )  +  (  ( s+p)+( s+q)  )  )  A2 

2.  (  (  (  r->q )  )-►  (  (  s^r  )-►  (  s-*q)  )  )  1,Sub  (r  for  p) 

3.  (  (  s->  (  r-*p)  )-*  (  (  s+r  )->  (  s+p)  )  )  2, Sub  (p  for  q) 

4.  (  (p->  (  r-*p)  )-*  (  (p->r  )->  (p+p)  )  )  3, Sub  (p  for  s) 

5.  (  (p^(q^p)  )-*(  ( p^q ) -> ( p->p )  )  )  4, Sub  (q  for  r) 

6 .  ( p->  ( q+p )  )  A 1 

7.  (  (p-Kj)-*  (p+p)  )  5 , 6MP 

8.  (  (p-*(q-*p)  )“*(p-*p)  )  7,  Sub  (  (q+p)  for  q) 

9.  (p+p)  6 , 8MP 

The  extension  of  the  axiomatic  method  to  the  predicate 
calculus  is  accomplished  by  adding  further  axioms  or  rules 
of  inference. 

"Semantic”  systems  of  logic  are  so-called  because  they 
attempt  to  mirror  the  intended  semantical  interpretation  in 
the  system  of  logic  itself.  For  the  propositional  logic, 
this  intended  semantical  interpretation  is  just  the  truth 
table;  and  consequently,  to  prove  whether  a  formula  A  is  a 
theorem  or  not,  it  is  customary  in  these  systems  to 
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introduce  devices  which  enable  one  to  find  out  whether  the 
assumption  that  A  is  not  a  theorem  would  also  require  that 
some  atomic  sentence  and  its  negation  both  be  assigned  true. 
This  is  normally  done  by  "breaking  down"  the  formula  “’A  into 
simpler  and  simpler  components.  Many  of  these  methods  can 
easily  be  represented  by  trees.  Jeffrey  (1967)  has  the 
following  system  of  rules  for  tree  construction: 


(A  &  B) 

A  ‘ 

B 


(A  +  B) 


(A  +  B) 


B 


"•“■A 


A 


-(A  &  B) 


"’A  _IB 


-(A  +  B) 

_,A 

-B 


-(A  B) 

A 

-B 


where  the  intuitive  idea  behind  a  rule  is  that  we  are 
interested  in  "ways  the  complex  formula  might  be  true."  The 
definition  of  a  proof  of  conclusion  C  is:  (1)  -,C  is  the  root 
node,  (2)  if  one  wishes  to  have  premises,  they  are  added  to 
the  root  node,  (3)  if  a  rule  of  inference  is  applied  to  a 
formula  B  which  occupies  a  node  of  the  tree,  the  result  of 
the  rule  of  inference  is  represented  in  every  "uncancelled" 
branch  that  B  dominates,  (4)  any  branch  which  contains  an 
atomic  formula  and  also  its  negation  is  "cancelled"  by 
putting  an  "x"  at  the  bottom  of  the  branch,  (5)  every  branch 
is  cancelled.  C  is  a  theorem  if  and  only  if  there  is  a  proof 
of  it.  The  proof  of  the  formula  (p-*p)  is  very  simple  in  this 
system.  We  put  -,(p-*p)  as  the  root  node  and  use  the  rule  for 


-(A+B) : 
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"•(p  -*•  p) 

P 

"P 

x 

A  somewhat  more  interesting  theorem  is  DeMorgan's 
(_,(p^q)  (p&-,q))  which  is  proved: 


-■(-’(p  q) 
"'(p  ■*  q) 

“■  ( p  &  “•q) 


■’q 


(p  &  _,q )  ) 

( p  -*  q) 
(p  &  ■'q) 
p 

■’q 

(p;q) 

"•p"  "  q 


X  X  XX 

It  is  quite  clear  here  (as  opposed  to  the  axiomatic  system) 
what  the  strategy  is:  one  assumes  the  (alleged)  theorem  to 
be  false  and  breaks  it  down  into  simpler  and  simpler 
components  by  the  truth-preserving  rules  (the  branched 
formula  is  true  if  and  only  if  at  least  one  of  its 
subbranches  is  true).  Since  the  resulting  formulae  get 
shorter  and  shorter,  the  method  (in  the  propositional 
calculus)  is  guaranteed  to  halt. 

In  computerized  theorem  proving,  the  most  commonly  used 
method  is  " resolut i on "  --  a  variant  of  the  semantic  methods. 
Here  (in  the  propositional  logic  case)  one  negates  the 
formula  to  be  proved  and  represents  it  by  its  equivalent 
clause  form.  (Recall  that  this  is  obtained  if  the  formula  is 
converted  to  skolem  normal  form  and  then  to  a  conjunction  of 
disjunctions  of  literals;  each  conjunct  being  called  a 
clause).  Each  clause  (which  is  itself  in  disjunctive  normal 
form)  is  written  on  a  separate  line  and  the  one  rule  of 
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inference,  "resolution",4  is  used.  If  premises  are  to  be 
used,  they  too  are  converted  to  clause  form  and  each  clause 
is  written  on  a  separate  line.  The  rule  of  resolution  is  (in 
its  simplest  statement) 

A,  +  B,  +...+P,  +  ...  +  Z, 

A2  +  B2  +  ...  +  _,P1  +  ...  +  Z2 


A i  +  B,  +  ...  +  Z,  +  A2  +  B2  +  ...  +  Z  2 
where:  each  of  the  lines  is  in  clause  form  and  the 
conclusion  (a  new  clause)  has  no  mention  of  P,  or  its 
negation  (it  has  been  "resolved  out").  This  new  clause  which 
has  been  created  is  added  to  the  list  of  clauses  and  the 
resolving  procedure  is  continued.  If  the  original  formula 
was  a  theorem,  then  eventually  the  method  will  yield  a  null 
resolvent  --  the  "empty  formula",  a  formula  with  no 
subformulae.  (The  rule  is  usually  generalized  to  apply  to  an 
arbitrary  number  of  premises  at  one  swoop.)  Clearly  the 
resolution  method  is  semantic  in  nature:  one  is  trying  to 
discover  whether  the  purported  statement  is  necessarily  true 
by  looking  at  ways  its  negation  might  be  true.  If  none  are 
found  (null  resolvent),  the  negation  can't  be  true  and  so 
the  original  statement  must  be. 

These  semantic  methods  can  be  extended  to  the  (non- 
decidable)  predicate  calculus  in  various  ways.  Jeffrey 
(1967)  adds  branching  rules  for  quantifiers,  but  these  rules 
are  not  effective  in  the  sense  that  they  needn't  ever  be 

4  Introduced  into  the  literature  by  Robinson  (1965). 
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used  again.5  Another  way  would  be  to  convert  the  formula 
into  a  skolem  normal  form.  This  resulting  ( non-quant i f ied) 
formula  can  now  be  treated  in  various  ways.  One  could  apply 
the  tree  method  of  above,6  or  one  could  continue  to  use  the 
resolution  procedure  by  introducing  a  special  understanding 
of  what  variables  can  resolve  against  which  and  generate  the 
null  resolvent.  Such  methods  will  be  considered  in  the  next 
chapter.  Even  in  the  complex  case  of  quantifiers  it  should 
be  clear  that  the  strategy  is  semantic :  quantifiers  are 
interpreted  --  existential  quantifiers  not  in  the  scope  of  a 
universal  are  replaced  by  a  name  (the  thing  in  the  model 
that  the  sentence  asserts  the  existence  of),  existential 
quantifiers  in  the  scope  of  a  universal  quantifier  are 
replaced  by  a  function  of  the  things  named  in  the  model  by 
the  universal  quantifiers,  and  so  on.  Finally  we  merely  look 
at  the  possible  co-truth  of  the  atomic  formulae. 

A  natural  deduction  system  is  like  the  semantic  systems 

and  unlike  the  axiomatic  systems  in  that  it  has  no 

unjustified  statements  (axioms)  and  in  that  it  has  a  large 

number  of  rules  of  inference;  however,  it  is  unlike  the 

semantic  systems  in  that  it  does  not  attempt  to  "break 

formulae  down"  into  simple  components  and  evaluate  their 

possible  co-truth.  Rather,  the  rules  of  inference  are 

supposed  to  correspond  to  psychologically  plausible  modes  of 

5 Al 1  formulae  in  a  proof  have  the  property  that  they  need  be 
"branched"  at  most  once,  except  for  universally  quantified 
formulae . 

6Thi s  would  require  that  each  clause  be  considered  a 
universally  quantified  formula  over  the  variables  mentioned 
in  the  matrix. 
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reasoning.  A  proof  is  a  method  of  breaking  down  a  formula 
into  "what  you  can  assume"  and  "what  still  needs  to  be 
proved",  together  with  methods  to  actually  do  some  of  the 
"proving".  There  are  a  number  of  these  natural  deduction 
systems  in  the  literature;  I  shall  here  present  (and  later 
employ)  the  one  found  in  Kalish  &  Montague  (1964).  For  the 
propositional  logic,  the  rules  of  inference  are: 


A 

A  and 

-’-'A 

(A&B) 

and  (A&B) 

A  (R); 

“l 

-'A 

A 

(DN)  ; 

A 

B 

(S)  ; 

(A+B) 

(A->B) 

A 

( AvB ) 

(AvB) 

A 

-B 

B 

■'A  and  -,B 

B 

(MP)  ; 

-1 A 

(MT) 

;  ( A&B ) 

( Ad j ) ; 

B 

A  ( MTP ) ; 

(A+B) 

(B+A) 

(A^B) 

and 

(A^B) 

A  and 

A 

( B«— >A ) 

(CB) 

;  (a+b) 

(B+A) 

( BC )  ? 

TAvB) 

( BvA )  (Add) 

which  abbreviations  stand  for,  respectively  R:  Repetition, 

DN :  Double  Negation,  S:  Simplification,  MP:  Modus  Ponens, 

MT:  Modus  Tollens,  Ad j :  Adjunction,  MTP:  Modus  Tollendo 
Ponens,  CB:  Conditionals  to  Biconditional,  BC:  Biconditional 
to  Conditional,  Add:  Addition.  These  are  all  taken  to  be 
psychologically  plausible  modes  of  reasoning.  An  antecedent 
line  is  defined  as  a  line  which  is  earlier  in  the  proof  and 
neither  boxed  nor  containing  an  uncancelled  'show*  (both 
defined  below).  A  proof  is  defined  as: 

1.  If  0  is  a  formula,  then 
Show  0 

can  occur  as  a  line.  (The  'show'  is  uncancel  led. 

Intuitively  we  are  setting  the  task  of  proving  0). 

2a.  If  Show  0  occurs  as  a  line  then  ^0  can  occur  as  the  next 


V 
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line  ("assume  the  negation"). 

2b.  If  Show  -’0  occurs  as  a  line,  then  0  can  occur  as  the 
next  line. 

2c.  If  Show  (0->i)  occurs  as  a  line,  then  0  can  occur  as  the 
next  line  ("assume  the  antecedent"). 

3.  If  0  follows  from  antecedent  lines  by  a  rule  of 

inference,  then  0  may  be  entered  as  the  next  line. 

4.  If  the  proof  has  a  subpart  which  looks  like 

Show  0 
XI 

• 

Xn 

and  (a)  there  are  no  uncancelled  ’Show’  among  XI... Xn, 
and  (b)  either  0  occurs  unboxed  (defined  below)  among 
XI... Xn,  or  else  both  ill  and  _,iii  occur  unboxed  among 
X 1 . . . Xn ,  then 

*Show  0 
XI 

• 

Xn 

can  be  the  next  step  in  the  proof  (XI... Xn  are  now  boxed 
--  and  thus  are  no  longer  antecedent,  and  the  'Show' 
line  is  cancelled  and  now  antecedent  (intuitively,  the 
lines  in  the  box  constitute  a  proof  of  0). 

5.  If  the  proof  has  a  subpart  which  looks  like 


Show  (0-*i|j) 
XI 


. 


25 


Xn 

and  (a)  there  are  no  uncancelled  "Show"  among  XI... Xn, 
and  (b)  i|j  occurs  unboxed  among  XI... Xn,  then 

*Show  (0-mJi) 

XI 

Xn 

may  occur  as  the  next  step  of  the  proof. 

6.  A  premise  may  be  entered  anywhere  in  the  proof. 

7.  The  formula  0  is  proved  if  it  occurs  unboxed  in  a  proof 

and  there  are  no  uncancelled  "Show"  in  the  proof. 

This  natural  deduction  system  is  extended  to  the 
predicate  calculus  by  adding  rules  for  Quantifier  Negation 
(QN),  Existential  Instantiation  (El),  Universal 
Instantiation  (UI ) ,  Existential  Generalization  (EG),  and 
another  method  of  boxing  and  cancelling  called  "universal 
derivation".  These  rules  of  inference  are: 

( Ag  )  0  and  "■  ( Eg  )  0  and  ( Ag  )  ~'0  and  ( Eg ) ^0 
( Ea  )  "’0  ( Aa  )  _,0  “•  (  Eg  )  0  ^  {Aa)  0  (QN) 

( Eg ) 0  ( Ag ) 0  0 ’ 

0^TEI)  0  t  (UI  )  ( Eg )  0  (EG) 

where  0’  is  the  result  of  replacing  all  free  occurrences  of 

g  in  0  by  some  term  in  such  a  way  that  this  new  term  does 

not  become  bound  by  any  quantifier  in  0'.  Furthermore,  in 

the  case  of  El,  the  substituted  term  must  be  a  variable 

which  is  entirely  new  to  the  proof  as  thus  far  constructed. 

The  new  rule  for  boxing  and  cancelling  is: 


'  . 
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If  the  proof  has  a  subpart  which  looks  like 

Show  (Aa)0 
XI 


Xn 

and  (a)  there  are  no  uncancelled  "Show"  among  XI... Xn , 
and  (b)  0  occurs  unboxed  among  XI... Xn,  and  (c)  a  does 
not  occur  free  in  any  line  antecedent  to  this  "show" 
line,  then 


*Show  ( Act )  0 
XI 

Xn 

may  occur  as  the  next  step  of  the  proof. 

I  close  this  section  with  some  short  proofs  to  give  a 
feeling  for  how  theorems  might  be  proved  using  this  system, 
followed  by  the  proofs  of  the  same  theorems  in  a  resolution 
system.  First,  the  theorem  (  (p^q)^  ( -,q^-,p)  )  ,  which  was  the 
longest  proof  completed  by  the  Logic  Theorist  (to  be 
described  in  the  next  chapter). 


1  . 
2. 

3. 

4. 

5. 


*Show  ((p^q)  ( -,q^_,p)  ) 

(p+q)  Assumption 

*Show  (^q-^-'p) 

-,q  Assumption 

-’p  2 , 4MT 


Second,  the  theorem  (p  +  “i_,-,p)  ,  of  which  one  can  prove  that 
the  Logic  Theorist’s  heuristics  are  incapable  of  proving, 
the  Logic  Theorist  cannot  prove  it. 
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2. 

3. 

4. 

5. 

6. 

7. 

8. 


*Show  (p  +  -•  “■  “'p ) 

“•  ( p  +  ",",_1p) 
*Show  -*p 

P 

(p  +  -^p) 
-’(p  +  -’-’-’p) 

(p  +  -’“’-'p) 


Assumpt i on 

Assumption 
4,  Add 

2  ,  R 

3  ,  DN 

7  ,  Add 


And  finally  a  simple  problem  involving  quantifiers. 


1  . 
2. 

3. 

4. 

5. 

6 . 

7. 

8. 


*Show  ( (Ax) (Px+Qx )  (  ( Ax  )  Px-*  ( Ax  )Qx  )  ) 

( Ax  )  ( Px-*Qx  )  Assumption 

*Show  (  (Ax  )  Px->  ( Ax  )Qx  ) 


(Ax)Px 
*Show  (Ax)Qx 
Px 

( Px-*Qx ) 
Qx 


Assumption 

4 ,  UI 
2 ,  UI 
6 , 7 , MP 


The  proofs  of  these  same  theorems  in  a  resolution 


system  would  go  like  this: 


To  prove  ((p-*q) 
1  .  ( -p  +  q) 

2 .  ->q 

3 .  p 

4.  q 

5 .  0 


( -,q-*-,p)  )  : 

--clause  from  negation  of  conclusion 
--clause  from  negation  of  conclusion 
--clause  from  negation  of  conclusion 
--resolve  1  and  3 

--resolve  2  and  4,  yielding  null  resolvent 


To  prove  (p  + 
1  .  -p 

2 .  p 

3.  0 


--clause  from  negation  of  conclusion 
--clause  from  negation  of  conclusion 
--resolve  1  and  2,  yielding  null  resolvent 


To  prove  ((Ax)(Px+Qx)  ->  (  ( Ax  )  Px->  ( Ax  )Qx  )  )  : 

1.  (-’Px  +  Qx)  --clause  from  negation  of  conclusion 

2.  Px  --clause  from  negation  of  conclusion 

3.  -’Qa  --clause  from  negation  of  conclusion 

4.  Qx  --resolve  1  and  2 

5.  0  --resolve  3  and  4 
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C.  The  "Axiomatic  vs.  Semantic  vs.  Natural  Deduction" 
Division 

The  division  in  the  previous  section  of  systems  of 
logic  into  axiomatic,  semantic,  and  natural  deduction  was 
done  as  a  matter  of  classi f icatory  convenience.  However,  it 
is  also  a  classification  with  which  I  have  some  considerable 
sympathy.  Indeed,  I  will  later  argue  that  it  is  natural 
deduction  to  which  we  should  turn  our  attention.  However,  I 
also  recognize  that  the  classification  is  not  completely 
clear  and  the  reasons  I  will  adduce  for  restricting  our 
attention  do  not  apply  with  equal  force  to  all  versions  of 
systems  within  each  classification.  This  section  therefore 
is  an  attempt  to  discuss  the  c lass i f icatory  problems. 

First  let  me  say  that  I  think  there  are  two  distinct 
types  of  reasons  one  could  appeal  to  in  preferring  natural 
deduction  to  the  other  types  of  systems,  and  that  I  do  use 
both  types  of  reason.  One  has  to  do  with  a  desire  to  try  to 
follow  the  mode  of  reasoning  employed  by  practitioners  of 
logic  and  mathematics,  while  the  other  has  to  do  with  a 
theoretical  appeal  to  "what  logic  really  is".  So  far  as  the 
first  reason  goes,  we  wish  to  divide  methods  of  proof  into 
those  which  are  (or:  can  easily  be)  employed  by  ordinary 
practitioners  of  logic.  This  division,  as  I  see  it,  places 
the  axiomatic  systems  and  various  of  the  semantic  systems 
(especially  resolution-based  systems)  on  the  side  of 
"impractical",  and  natural  deduction  and  various  of  the 
semantic  systems  (e.g.,  Jeffrey’s  tree  method)  on  the  side 


. 
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of  "easy  to  use".  The  second  reason,  I  claim,  places  various 
of  the  semantic  systems  (e.g.,  Jeffrey's  system,  truth  table 
lookup,  and  semantic  tableaux  methods)  on  the  side  of  "not 
really  logic",  and  natural  deduction,  axiomatic  systems,  and 
various  semantic  methods  (e.g.,  resolution  based  systems)  on 
the  side  of  "true  logic".  If  this  is  correct’,  it  is  seen 
that  only  natural  deduction  systems  meet  both  of  the 
desirability  conditions. 

Although  it  will  be  discussed  again  later,  I  remark  now 
that  the  first  reason  --  ease  of  use  by  people  --  clearly 
does  classify  the  systems  as  I  mentioned.  Axiomatic  systems 
and  resolution  systems  are  very  difficult  for  most  people, 
even  accomplished  logicians,  to  use;  the  natural  deduction 
systems  and  the  other  semantic  methods  are  recognized  as 
being  more  akin  to  ordinary  use.  It  is  the  second  reason, 
and  how  it  classifies,  that  I  wish  to  discuss  in  this 
section . 

On  an  intuitive  level,  the  "axiomatic  vs.  semantic  vs. 
natural  deduction"  division  seems  well-motivated  and  clear. 
But  when  one  tries  to  give  criteria  which  will  unambiguously 
classify  an  arbitrary  system  of  logic,  the  clearness  of  the 
divisions  begins  to  vanish.  But  even  though  it  be  difficult 
to  state  the  criteria,  it  seems  to  me  that  the  division  has 
sufficient  intuitive  pull  that  it  should  be  kept.  To  that 
end  then,  I  shall  state  what  I  take  to  be  some  of  the 
hallmarks  of  each  type  of  system,  trying  to  justify  placing 
the  various  systems  discussed  in  the  last  section  into  the 
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categories  I  did.  Simultaneously,  however,  I  will  present 
counterevidence  to  the  adequacy  of  the  proffered  criteria, 
thereby  showing  that  the  different  types  of  systems  seem  to 
collapse  into  one  another. 

So  on  the  one  hand  it  seems  clear  that  there  are  (at 
least)  these  three  different  types  of  systems,  on  the  other 
hand  not  much  of  methodological  importance  can  be  generated 
from  the  distinctions  until  and  unless  better  criteria  are 
discovered.  At  the  end  I  will  make  some  highly  abstract 
remarks  about  what  I  take  the  point  of  logic  to  be.  Given 
the  view  just  espoused,  that  nothing  of  methodological 
significance  is  forthcoming  from  the  distinctions,  these 
abstract  remarks  are  somewhat  of  an  anomaly.  For  I  propose 
to  argue  against  pursuing  axiomatic  and  semantic  systems  (in 
Chapter  IV);  and  this  argument  is  based  on  my  comments 
concerning  the  point  of  logic  and  how  it  relates  to  the 
three  types  of  systems. 

One  way  of  classifying  systems  of  logic  is  according  to 
whether  they  admit  any  formulae  to  be  "not  in  need  of 
proof",  i.e.,  by  whether  they  admit  axioms.  This  seemingly 
clear-cut  criterion  would  separate  axiomatic  systems  from 
the  others  (semantic  and  natural  deduction).  However,  even 
this  apparently  obvious  distinction  has  been  challenged. 
McCawley  (1981:  44-46),  for  example,  remarks  that  one  could 
look  at  axioms  as  being  a  special  kind  of  rule  of  inference 
--  namely  the  kind  which  have  no  premises.  So  there  would  be 
no  difference  in  kind  between  the  three-premise  rule  of 
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seems  to  me 

that  there  is  nevertheless  considerable  value  in  retaining  a 
distinction  between  axiomatic  systems  and  the  others.  Mostly 
this  has  to  do  with  ease  of  psychological  processing  of 
proofs,  but  this  is  not  the  place  to  discuss  such  matters. 

It  is  perhaps  also  worth  noting  that  the  other  systems  might 
be  considered  converted  to  axiomatic  systems  by  replacing 
rules  of  inference  in  them  by  axioms.  For  example  the  rule 

( P-»Q )  ,  -Q  hp 

might  be  replaced  by  the  axiom7  . 
h  (p+q)  +  (-,q+“,p) 

So,  while  the  axiomat ic/non-axiomat ic  distinction  is 

70f  course  there  still  have  to  be  some  rules  of  inference 
(as  Lewis  Carroll’s  tale  of  Achilles  and  the  tortoise 
teaches  us).  Here  I  have  in  mind  retaining  MP  and  Sub. 
Furthermore,  any  rule  which  violates  the  Deduction  Theorem 
must  be  retained  as  a  rule,  not  as  an  axiom.  (See  the 
section  below  on  the  Deduction  Theorem). 
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difficult  to  maintain  theoretically,  it  seems  clear  that 
there  is  such  a  distinction. 

More  vexed  is  the  distinction  between  semantic  and 
natural  deduction  systems.  It  is  perhaps  instructive  to 
begin  by  giving  two  clear  reasons  one  might  call  a  system 
semantic,  and  what  the  intent  is  in  calling  a  system  a 
natural  deduction  system.  We  shall  then  see  that  it  is  easy 
to  construct  a  seemingly  unbroken  continuum  between  the  two. 
While  this  might  cause  one  to  doubt  the  legitimacy  of  the 
distinction,  I  shall  point  to  the  clear  instances  of  each  in 
my  attempt  to  justify  classifying  Kalish  &  Montague  as 
natural  deduction  and  resolution  systems  as  semantic. 

One  clear  sense  of  "semantic  system"  occurs  when  the 
prover  refers  to  some  representation  of  an  instance  of  what 
is  under  debate;  and,  referring  to  the  instance,  makes 
claims  about  it  and  generalizes  to  obtain  a  proof  of  the 
theorem.  I  have  in  mind  here  such  programs  as  geometry 
provers  which,  when  constructing  an  argument  about  (say) 
equilateral  triangles  in  general,  actually  construct  a 
representation  of  a  particular  equilateral  triangle,  prove 
claims  about  it,  and  make  various  constructions  within  it. 
Having  proved  the  relevant  claim  about  the  particular 
triangle,  the  prover  generalizes  to  all  equilateral 
triangles . 

Another  clear  sense  of  "semantic  system"  occurs  when 
the  rules  of  inference  each  perfectly  mirrors  the  intended 
semantic  interpretation.  Thus,  for  example,  in  a 
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propositional  logic  system  where  the  semantic  interpretation 
is  just  the  truth  table  analysis  of  the  formula,  a  system 
whose  "rules  of  inference"  were  to  construct  truth  tables 
and  analyze  them  would  certainly  be  considered  "semantic". 
However,  this  is  a  clear  case.  One  can  deviate  from  the 
clear  case  and  still  be  convinced  one  has  a  semantic  system. 
For  example,  if  the  above  system  were  to  first  negate  the 
formula  and  do  a  truth  table  analysis  on  this  negation,  and 
claim  it  to  be  a  theorem  if  the  negation  were  uniformly 
false,  this  would  be  a  clearly  semantic  system.  A  little 
further  down  this  continuum  are  systems  which  use  various 
shortcut  methods  on  the  truth  tables:  e.g.,  they  use  methods 
of  evaluation  such  as  "instead  of  looking  at  the  truth  or 
falsity  of  a  conditional  (P->Q),  look  rather  at  the  truth  or 
falsity  of  -,P  and  the  truth  or  falsity  of  Q".  There  are  a 
variety  of  such  "shortcuts"  available.  Indeed,  Jeffrey's 
system  of  the  last  section  has  one  such  shortcut  for  each 
connective  and  its  negation.  For  this  reason  it  is  properly 
called  "semantic". 

The  hallmark  of  natural  deduction  systems  is  that  the 
rules  of  such  systems  are  supposed  to  mirror  actual  (valid) 
reasoning  patterns  which  people  employ.  In  general  these 
patterns  of  valid  reasoning  are  taken  to  be  "universal"  in 
scope  and  to  apply  to  any  subject  matter.  For  that  reason 
there  is  an  attempt  to  view  the  rules  as  being  completely 
separate  from  any  semantic  interpretation  which  might  be 
placed  upon  them.  In  this  respect,  natural  deduction  systems 
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are  close  to  the  usual  understanding  of  axiomatic  systems 
(one  is  to  ignore  any  interpretation  which  might  be  placed 
on  these  systems),  and  distinct  from  the  central  point  of 
semantic  systems  (where  one  is  constantly  to  make  reference 
to  the  intended  interpretation  of  (say)  the  premises  of  the 
argument  under  consideration). 

In  spite  of  the  fact  that  the  semantic/natural 
deduction  distinction  seems  clear  when  one  focuses  on  the 
characterizations  offered  in  the  last  few  paragraphs,  it  is 
merely  a  matter  of  focus.  Thus,  if  one  focuses  on  the 
question  "Why  is  this  step  in  this  proof  valid?",  one  will 
get  an  answer  like  the  following  from  someone  who  uses  a 
semantic  system:  "When  you  attend  to  what  the  objects  under 
consideration  are  (the  interpretation  of  the  symbols),  you 
will  be  immediately  forced  to  acknowledge  that  the  step  must 
follow."  Now  contrast  this  with  the  natural  deduction 
answer:  "The  step  is  valid  because  it  follows  a  universally 
valid  pattern  of  thought."  One  is  tempted  to  ask,  "Why  do 
you  think  this  pattern  valid?"  And  the  answer  will 
inevitably  be  some  variant  of  "You  can't  imagine  a  situation 
in  which  it  would  fail,"  or  equivalently,  "Given  the 
interpretation  placed  on  the  symbols,  it  could  not  be 
otherwise."  Here  we  see  the  two  views  of  logic  being  forced 
to  the  side  of  semantic  interpretation  when  the  focus  is 
upon  the  just  if icat ion  of  the  rules  of  inference. 

However,  one  might  wish  to  look  upon  the  issue  from  a 
different  point  of  view  —  that  of  matching  abstract 


I 

' 


1 

. 


% 


35 


patterns.  According  to  this  way  of  focusing  on  the  issue,  we 
are  not  looking  at  the  interpretation  of  the  symbols,  but 
rather  at  whether  the  system  allows  us  to  view  a  derivation 
as  a  series  of  steps  each  of  which  obeys  a  certain  statable 
pattern.  Looking  at  the  matter  this  way  --  whether  the 
system  is  a  matter  of  "symbol  rearrangement"  --  the  answer 
(for  most  systems  anyway)  is  yes.  Even  so  semantic  a  system 
as  Jeffrey’s,  with  its  exact  mirroring  of  each  type  of  truth 
table,  would  be  a  system  in  which  one  looks  to  whether 
formulae  manifest  certain  patterns.  Perhaps  on  this  view, 
only  systems  that  explicitly  look  to  something  besides  the 
formulae  would  be  semantic.  (The  geometry  example  and  the 
explicit  truth  table  analysis  example  might  still  be  called 
semantic ) . 

Yet  there  seems  a  clear  sense  in  which  (say)  Jeffrey’s 
system  is  semantic.  It  seems  to  me  to  be  a  matter  of  whether 
the  rules  of  inference  in  the  system  closely  follow  the 
intended  semantic  interpretation  versus  whether  they  are 
taken  to  be  independently  statable.  And  if  this  be  right, 
then  there  is  a  continuum  along  which  different  systems  can 
be  placed,  from  "clearly  semantic"  to  "clearly  natural 
deduction" . 8 


So,  given  a  system  which  is  not  at  one  extreme  or  the 
other,  what  grounds  can  be  given  for  calling  it  "semantic" 


8 Looked  at  in  this  way,  the  axiomatic  systems  can  be  put 
anywhere  along  the  continuum,  depending  on  the  extent  to 
which  the  axioms  and  rules  of  inference  follow  the  semantic 
interpretation.  Most  examples,  however,  are  closer  to  the 
pattern  matching  end.  (Such  as  the  system  mentioned  in  the 
last  section ) . 
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or  "natural  deduction"?  It  seems  to  me  that  this  is  a  matter 
of  taste,  but  that  there  are  certain  guidelines.  One 
guideline  must  be  whether  the  patterns  exemplified  in  its 
rules  correspond  to  independently  given,  psychologically 
plausible  modes  of  reasoning.  Another  guideline  is  whether 
the  rules  of  inference  closely  follow  the  semantic 
interpretation  in  a  step  by  step  manner.  Doubtless  one  could 
find  others.  The  most  contentious  c lass i f icatory  decision  I 
made  in  the  last  section  was  probably  placing  resolution 
systems  into  the  semantic  category.  But  according  to  the  two 
guidelines  just  given,  that  is  exactly  where  it  goes.  The 
rule  of  resolution  is  certainly  not  psychologically 
plausible  from  any  independent  point  of  view.  (If  it  were, 
it  would  have  been  discussed  by  some  logicians  before  1960. 
All  the  usual  natural  deduction  rules  were  discussed  by  the 
ancient  Greeks,  the  mediaeval  grammarians,  Boole,  de  Morgan, 
Frege,  Russell,  etc.)  And  while  the  resolution  systems  do 
not  precisely  mirror  the  semantic  interpretation  to  the  same 
degree  as  Jeffrey's  system,  or  semantic  tableaux,  there  is 
still  a  close  correspondence:  one  is  trying  to  see  whether 
there  is  a  possible  counterexample.  One  does  this  by 
constructing  formulae  representing  what  such  counterexamples 
would  look  like,  interpreting  quantifiers  as  describing 
certain  entities  in  the  domain,  and  the  like. 

I  will  now  state  dogmatically  that  I  think  "true  logic" 
to  be  a  matter  of  applying  general  reasoning  patterns, 
patterns  which  are  universal  and  apply  to  any  subject 
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matter,  to  some  specific  problem  at  hand.  That  is,  it  is 
only  those  systems  on  the  natural  deduction  end  of  our 
continuum  that  I  take  to  be  "really  logic."  The  semantic 
systems  are  no  more  than  a  way  of  encoding  facts  about  the 
specific  problem  at  hand  and  applying  some  tools  that  work 
for  that  interpretation.  And  this  is  so  even  with 
"universal"  semantic  systems  such  as  resolution-based 
systems.  Even  though  resolution  systems  can  be  applied  to 
any  domain  or  interpretation,  they  amount  to  just  a  clever 
way  of  restating  the  interpretation  and  working  on  this 
restatement  (as  Jeffrey’s  system  is  just  a  clever  way  of 
restating  truth  tables  and  evaluating  them).  There  are 
doubtless  uses  for  semantic  systems,  but  the  study  of  logic 
and  logical  theory  is  not  one  of  them,  especially  if  one  is 
concerned  to  describe  human  reasoning.’ 

D.  Arguments  and  Theorems 

It  will  not  have  escaped  notice  that  I  have  been 
somewhat  inconsistent  in  my  use  of  'theorem'  and  'theorem 
prover.'  On  the  one  hand,  in  logic  a  theorem  is  a  formula 
which  can  be  proved  in  a  certain  way  (namely,  from  no 
premises),  while  on  the  other  hand  the  examples  given  in 
Chapter  I  of  "theorems"  would  have  it  that  a  theorem  is  what 
follows  logically  from  certain  other  given  formulae  (the 
premises).  More  mysterious,  perhaps,  is  the  use  of  'theorem' 

9 i n  Chapter  IV  I  again  cons ider . these  types  of  systems  and 
give  further  reasons  for  my  choice  of  a  natural  deduction 
system  as  what  I  wish  to  investigate. 
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in  axiomatic  systems  to  describe  what  follows  from  the 
axioms  and  no  other  premises.  Let  us  therefore  be  a  little 
more  precise  here,  but  recognize  that  the  literature  on 
mechanical  deduction  is  itself  somewhat  ambiguous  on  this 
terminological  issue,  and  that  I  intend  to  maintain  this 
ambiguity. 

Let  us  start  with  natural  deduction  systems.  One  of  the 
rules  of  proof  in  Kalish  &  Montague's  system  is  that  a 
premise  can  be  introduced  anywhere  in  a  proof.  If  this  rule 
is  not  employed  anywhere  in  a  given  proof  (as  for  example  it 
was  not  used  in  the  examples  given  at  the  end  of  the  last 
section),  then  the  formula  being  proved  is  a  theorem  in  the 
technical  sense.  If  the  rule  was  employed,  then  the  formula 
being  proved  depends  upon,  as  it  were,  the  premise(s)  used. 
Semantically  speaking,  the  formula  proved  has  merely  been 
shown  "true  if  the  premise  is  true";  whereas  if  the  rule  had 
not  been  employed  then  the  formula  proved  is  true 
(simpl iciter) .  Now,  had  the  premise  itself  been  a  theorem 
(in  the  technical  sense)  then  of  course  the  proved  formula 
would  also  be;  and  if  one  did  not  know  whether  the  premise 
were  a  theorem,  one  might  say  (in  a  somewhat  loose  sense) 
that  the  proved  formula  follows  as  a  theorem  from  the 
premise.  I  suspect  it  is  this  sort  of  loose  usage  that  has 
been  picked  up  in  the  automatic  theorem  proving  literature 
and  which  leads  to  the  ambiguity  noted  above.  A  similar 
situation  occurs  in  the  semantic  proof  procedures.  I  noted 
in  the  last  section  that  to  prove  a  theorem  (in  the 
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technical  sense)  in  Jeffrey's  system,  one  negates  it,  roots 
a  tree  with  this  negation,  and  applies  the  various  branching 
rules.  One  can  introduce  premises  into  this  scheme  by  merely 
adding  them  to  the  root  node  and  have  the  branching  rules 
apply  to  all  the  formulae.  Again  one  wants  to  say  that  if 
the  premises  were  added  to  the  root  (and  all  branches 
closed)  then  the  conclusion  follows  as  a  theorem  from  the 
premises;  but  if  no  premises  were  used  then  the  proved 
formula  simply  is  a  theorem  (technical  sense).  Similarly,  in 
a  resolution  system  if  clauses  corresponding  to  premises  are 
used,  then  (if  the  null  resolvent  is  reached)  the  proved 
formula  follows  as  a  theorem  from  the  premises.  In  the 
axiomatic  system  exhibited  in  the  last  section,  it  is 
important  to  note  that  the  propositional  letters  used  in  the 
axioms  are  propositional  variables  and  can  have  as 
substitution  instances  any  formula  with  (or  without)  such 
variables.  The  formula  actually  proved  there,  (p-*p)  ,  was 
itself  composed  exclusively  of  propositional  variables. 
Anything  which  can  be  proved  from  those  axioms  and  the  rules 
of  inference  is  a  theorem  (technical  sense).  But  premises 
contain  propositional  constants,  i.e.,  the  propositional 
letters  in  them  are,  semantically  speaking,  interpreted . 
Hence  substitution  is  not  allowed  on  them.  One  extends  the 
definition  of  proof  so  that  a  premise  can  be  entered  as  any 
line,  but  no  propositional  constant  can  have  the  rule  of 
substitution  used  on  it.  The  last  line  of  such  a  proof  will 
be  a  formula  that,  again,  follows  as  a  theorem  from  the 
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premises . 

If  one  wishes  to  be  more  technical,  one  should  use 
'theorem'  in  the  sense  described  as  "technical"  in  the  last 
few  paragraphs,  and  use  'is  a  valid  conclusion  from  the 
premises'  for  the  sense  described  here  as  "loose".  I  shall, 
however,  not  be  technical,  except  in  cases  where  the  context 
does  not  completely  determine  which  sense  is  at  issue. 

E.  The  Deduction  Theorem 

When  one  extends  the  notion  of  theorem  proving  to  cover 
arguments  with  premises,  one  must  investigate  whether  there 
are  any  significant  differences  between  "pure"  arguments 
(without  premises)  and  "applied"  arguments  (with  premises). 
If  we  restrict  our  attention  to  arguments  with  only  finitely 
many  premises  (which  we  shall  throughout  this  work),  then 
any  differences  will  come  out  in  whether  the  Deduction 
Theorem  holds  or  not.  Let  A  [-0  stand  for  "(formula)  0  is  a 
valid  conclusion  from  (the  set  of  formulae)  A";  intuitively, 
A  is  the  set  of  premises  and  0  the  conclusion.  The  Deduction 
Theorem  states 

if  A  u  { il) }  |-0  then  a|-(i|i^0) 

And  since  A  is  finite,  repeated  applications  of  the 
Deduction  Theorem  will  eventually  yield 

if  au{i|j}|-0  then  |-(  iJj  ,-*(  it  (...-*( il>->0  )...)  ) 

(where  d)  ,  ,  i|j  2  ,  .  .  .  are  all  the  formulae  in  a).  Since  in  all 
the  systems  under  consideration 
( 0  ,  (  0  2  ~*0  3  )  ) 
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and 

(  (  0  ,  &0  2  )  -*0  3  ) 

are  equivalent,  we  can  more  perspicuously  state  the  result 

as 

if  A  u  {  iJj  }  [-0  then  |-(  ( i|i  ,&...& iji )  -*0  ) 

That  is,  for  every  argument  there  is  a  theorem  corresponding 
to  it  which  has  the  above  form.  However,  in  some  of  the 
systems  under  consideration  here,  the  Deduction  Theorem  does 
not  hold  in  general.  The  following  is  a  valid  argument  in 
Kalish  &  Montague  and  in  resolution  systems 
Py  |-(  Ax  )  Px 

but 

f-(  Py->  ( Ax  )  Px  ) 

is  not  a  theorem  in  either.  In  Jeffrey's  system  the  former 
is  not  a  valid  argument  and  the  latter  is  not  a  theorem;  the 
Deduction  Theorem  holds  in  this  system.  The  reason  it  does 
not  hold  in  Kalish  &  Montague  and  in  resolution  systems  is 
their  handling  of  formulae  with  free  variables.  When  they 
occur  as  premises  or  as  conclusions  in  the  Kalish  &  Montague 
system,  or  anywhere  in  a  resolution  system,  they  are 
implicitly  universally  quantified.  Thus  in  these  systems  the 
above  argument  is  equivalent  to 
( Ay )  Py  |-(  Ax  )  Px 

and  the  formula  is  equivalent  to 

( Ay ) ( Fy+ ( Ax ) Fx ) 

It  is  obvious  now  that  the  argument  is  valid  and  the  formula 
is  not  a  theorem.  In  Jeffrey's  system,  free  variables  are 
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taken  to  be  like  constants,  thus  explaining  why  neither  is 
the  argument  valid  nor  the  formula  a  theorem. 

This  sort  of  example  shows  that  there  really  is  an 
important  difference  between  theoremhood  and  valid 
arguments,  even  when  we  restrict  our  attention  to  arguments 
with  finitely  many  premises.  We  should  make  sure,  in 
evaluating  automatic  theorem  provers,  that  they  are  able  to 
handle  either  failure  or  success  of  the  Deduction  Theorem, 
depending  upon  which  underlying  system  of  logic  is  being 
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III.  A  BRIEF  SURVEY  OF  AUTOMATIC  THEOREM  PROVING:  METHODS 

AND  STRATEGIES 

A.  Soundness,  Completeness,  and  Decidability 

A  system  of  logic,  it  will  be  recalled  from  Chapter  II, 
is  a  recursively  specified  set  (possibly  empty)  of  axioms 
and  a  recursively  specified  set  of  rules  of  inference. 
Together  with  a  definition  of  proof,  these  specify  a  set  of 
theorems  of  the  logic;  and  so  in  some  sense  one  can  identify 
the  logic  with  its  set  of  theorems.  However,  as  we  have 
already  seen,  there  is  another  sense  in  which  the  logic  is 
not  the  set  of  theorems  because  there  may  be  a  difference  in 
what  arguments  two  systems  consider  valid  even  though  the 
theorems  are  the  same.  Since  our  interests  include  arguments 
in  general,  I  will  take  the  former  characterization  of  a 
logic  as  the  object  under  discussion,  rather  than  the  ’’set 
of  theorems"  characterization. 

It  is  also  my  intention  to  take  standard  notions  of 
"semantics"  as  given.  Thus  an  interpretation  of  the  logic 
will  be  a  model  structure  having  only  well-known  properties. 
A  proposition  (0-place  predicate)  has  as  value  either  true 
or  false,  an  n-place  atomic  predicate  is  assigned  an  ordered 
n-tuple  of  items  from  the  domain,  a  constant  has  as  value  an 
element  of  the  domain,  an  n-place  function  symbol  is  assign¬ 
ed  a  set  of  ordered  (n+1)-tuples  of  elements  from  the  domain 
in  such  a  way  that  whenever  <a , , a 2 , . . . , and 
<a  !  ,a2  ,  .  .  .  ,  a  ,  j8>  are  in  this  set  then  a=i 3.  The  propositional 
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connectives  are  given  their  usual  truth  functional 
interpretations,  and  the  quantifiers  are  given  an  objectual 
interpretation.  Having  stated  such  an  interpretation  (of  the 
predicates,  constants,  functions),  one  goes  on  to  define 
what  it  is  for  a  formula  to  be  true  under  that 
interpretation.  We  shall  employ  standard  definitions  in  this 
regard:  An  argument  is  (semantically)  valid  just  in  case:  in 
every  interpretation  where  all  the  premises  are  true  so  is 
the  conclusion. 

For  an  abstract  system  of  logic  to  be  sound,  every 
argument  one  can  construct  a  proof  for  is  (semantically) 
valid.  For  it  to  be  complete,  every  (semantically)  valid 
argument  must  have  some  proof.  Every  system  considered  in 
Chapter  II  is  both  sound  and  complete.10  (Of  course  since 
Jeffrey’s  system  has  different  provable  arguments  than,  say, 
Kalish  &  Montague's,  this  entails  that  they  will  have 
different  semantics.)  Since  one  has  some  particular 
understanding  of  the  symbols  in  mind  (a  semantics),  and  one 
is  using  the  logic  as  a  way  of  manipulating  the  symbols  so 
as  to  aid  in  characterizing  this  understanding,  it  is  clear 
that  one  wants  the  abstract  system  of  logic  to  be  sound  and 
complete.  For  if  not,  there  would  either  be  (semantically) 
"good"  arguments  that  were  not  correctly  so-called  (by  the 

10  Chapter  II  was  purposefully  vague  with  respect  to  which 
quantifier  rules  will  be  added  to  the  "semantically-based" 
systems,  and  the  axiomatic  system  was  left  unspecified 
beyond  the  impl icat i onal  fragment.  What  is  meant  here  is 
that  there  are  straightforward  completions  (axioms,  rules  of 
inference)  for  all  these  systems  which  will  render  them 
sound  and  complete. 
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syntax),  or  else  there  would  be  (semantically)  "bad" 
arguments  decreed  to  be  "good"  (by  the  syntax). 

What  has  thus  far  been  said  applies  to  the  abstract 
systems  of  logic  only.  When  these  systems  are  implemented  as 
a  computer  program,  various  other  matters  come  into  play  in 
considering  issues  such  as  completeness.  For  example,  any 
implementation  of  the  axiomatic  system  discussed  in  Chapter 
II  which  was  organized  in  such  a  way  that  (a)  the  theorem  to 
be  proved  had  to  be  a  substitution  instance  of  the 
consequent  of  an  axiom  and  (b)  the  antecedent  of  that  axiom 
had  to  be  a  substitution  instance  of  an  axiom,  would 
obviously  not  be  able  to  prove  all  theorems  of  the 
propositional  logic.  And  this  is  so  in  spite  of  the  fact 
that  the  system  as  a  whole  (which,  as  I  have  mentioned,  is 
complete)  allows  the  axioms,  substitution  in  axioms,  and 
modus  ponens  --  generally,  all  that  is  required  to  be 
complete.  The  idea  is  that  a  system's  being  complete  means 
that  some  proof  invoking  the  axioms,  substitution,  and  modus 
ponens  exists.  But  the  search  organization  of  the 
implementation  ~~  while  it  is  an  implementation  of  a 
complete  system  --  may  not  itself  be  complete. 

We  shall  see  below  similar  cases.  Indeed,  we  can  even 
interleave  another  level  of  completeness  here.  The 
resolution  inference  system  of  Chapter  II  is  a  complete 
(abstract)  system.  Thus  there  is  some  series  of  resolutions 
which  yield  IE]  for  every  (semantically)  valid  argument.  Often 
one  can  prove  that  there  is  some  specific  way  of  organizing 
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these  resolutions  such  that  every  proof  exhibits  that 
organization.  I.e.,  one  might  say  that  every  valid  deduction 
can  be  given  in  a  normal  form.  If  so,  then  one  says  that 
this  subsystem  (the  normalized  forms)  is  itself  complete  and 
thus  is  equivalent  to  the  entire  system. 

As  before,  we  can  talk  about  the  search  organization  of 
the  implementation  for  this  normalized  system,  and  find  it 
complete  or  not.  Unless  one  uses  exhaustive  searches  for 
one's  implementation,  it  is  extremely  difficult  to  achieve  a 
complete  search  organization  implementation.  This  is  so 
because  the  abstract  system's  being  complete  means  merely 
that  there  is  some  proof  or  other;  the  normalized  system's 
completeness  also  only  means  that  there  is  some  proof  or 
other  obeying  these  constraints.  But  for  a  search 
organization  implementation  to  be  complete  would  mean  that 
the  method  used  will  actually  find  the  proof,  given 
sufficient  time  to  look. 

Limitations  to  complete  implementation  come  from  two 
directions.  The  first  is  logical.  It  is  a  well-known  fact 
that  the  first  order  predicate  logic  is  undecidable,  even 
though  (abstractly)  complete.  What  this  means  is  that  as  a 
matter  of  logic  there  are  formulae  such  that  no  matter  how 
long  one  has  been  constructing  a  proof  by  a  complete  method, 
there  will  be  no  termination  and  one  does  not  know  whether 
the  formula  is  provable  (but  the  proof  not  yet  finished)  or 
whether  the  formula  is  not  provable.  So  as  a  matter  of  logic 
there  can  be  no  complete  implementation  for  the  first  order 
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predicate  logic:  even  in  the  limit  we  cannot  discover,  for 
all  formulae,  whether  they  are  or  aren't  theorems.  The 
second  direction  of  limitations  comes  from  the  inherent 
finiteness  of  machine  implementations  as  opposed  to  the 
inherent  infiniteness  of  abstract  systems  of  logic.  As  one 
can  see  from  the  definitions  of  the  vocabulary,  terms,  and 
formulae  in  Chapter  II,  the  (abstract)  logic  we  are 
interested  in  has  an  infinite  number  of  variables,  terms, 
and  formulae.  Yet  by  its  very  nature,  an  implementation  is 
limited  --  for  example  in  the  size  of  a  formula  which  can  be 
stored  for  consideration.  Not  only  this,  but  also  (in  the 
abstract  system)  a  proof  can  grow  without  end  and  yet  be  a 
correct  proof.  Thus  there  will  be  proofs  which  cannot  be 
implemented,  even  if  we  are  talking  about  a  logic  which  is 
decidable  and  for  which  we  have  an  otherwise  complete 
implementation.  So,  although  our  system  of  logic  may  be 
complete  (as  is  the  first  order  predicate  logic),  and  even 
though  our  normalized  form  of  proof  may  be  complete  (as  is 
the  linear  resolution  we  shall  consider  shortly),  we  may 
never  be  able  to  find  a  proof  of  a  given  theorem.  I  shall 
not  consider,  in  this  thesis,  the  problems  of  finiteness  of 
implementation.  For  example,  the  system  proffered  in 
Chapters  V,  VI,  and  VII  has  only  40  variables  at  its 
disposal;  so  any  theorem  whose  proof  requires  41  variables 
will  not  be  found,  and  the  method  will  fail.  Yet,  since 
memory  and  size  constraints  are  continually  being  revised 
upward,  I  do  not  consider  this  to  be  a  crucial  limitation. 
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More  crucial  limitations  are  (a)  those  that  do  not  employ  a 
complete  normalized  form  (e.g.,  unit  resolution,  which  we 
shall  consider  shortly),  and  (b)  those  which  while  utilizing 
a  complete  normalized  form  of  proof,  do  not  give  sufficient 
search  direction  to  find  a  proof. 

This  latter  is  a  matter  of  degree.  Thus  linear 
resolution  is  complete;  and  systems  utilizing  it  employ 
various  strategies  which  "help  proofs  get  started  in  the 
right  way".  Such  strategies  or  heuristics  are  to  be  judged 
by  how  often  they  work.  As  we  shall  see,  resolution-based 
implementations  are  particularly  weak  in  this  regard. 

B.  Early  Heuristic  Approaches 

Computerized  theorem  proving  begins  with  Newell  et  al 
(1957).  They  attempted  to  use  the  formalism  of  Whitehead  & 
Russell  (1910-1912)  for  the  propositional  calculus  and  to 
prove  theorems  of  this  theory.  They  claimed  interest  in 
human  problem  solving  techniques,  however;  so  they  devised  a 
proof  search  organization  which  would,  they  hoped,  mirror 
that  of  humans  attempting  this  same  task.  In  particular  they 
eschewed  (what  they  called)  the  British  Museum  Algorithm  of 
enumerating  all  possible  proofs  until  the  one  being  searched 
for  appears.  Of  the  various  proof  search  organization 
techniques  they  employed,  perhaps  the  most  interesting  one 
was  that  of  "working  backwards"  from  the  goal  to  be  proved 
to  a  "subproblem"  that  is  perhaps  simpler  than  the  original 
goal.  The  techniques  that  they  employed  were  called  (by 
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them)  "heuristics",  a  term  that  apparently  means 
"intelligent  guesses  as  to  what  to  do  next  that  have  no 
assurance  of  working."  Their  success  at  proving  theorems  of 
the  propositional  calculus  was  very  modest:  the  previous 
chapter  shows  what  the  hardest  one  they  solved  was  and  gives 
an  example  of  a  theorem  which  their  "heuristics"  are 
incapable  of  solving. 

Gelernter's  (1963)  Geometry  Theorem  prover  was 
similarly  organized,  although  it  employed  a  method  of 
looking  at  a  "diagram"  which  was  offered  with  the  statement 
of  the  problem.  It  too  looked  "backward  from  the  goal",  but 
managed  to  cut  down  on  the  number  of  possible  "ways  to  go 
backward"  by  incorporating  this  "domain  specific  knowledge" 
about  what  is  true  in  the  diagram.  The  method  invoked  in 
these  two  theorem  provers  is  called  problem  reduction 
format,  a  concept  to  which  we  shall  return  shortly. 

C.  The  Decline  of  Heuristics 

The  1960's  saw  a  move  away  from  this  "heuristic"  method 
of  constructing  proofs.  Wang  (1963)  gave  an  algorithmic 
development  of  the  propositional  logic,  with  an  extension  to 
the  predicate  logic,  using  the  methods  of  Herbrand,  and 
argued  strongly  against  the  use  of  methods  which  were  known 
not  to  automatically  succeed  ("heuristics")  when  there  was 
an  automatic  method  available  (even  if  the  automatic  method 
was  not  how  humans  would  solve  the  problems).  We  shall 
return  to  this  issue  in  Chapter  IV.  This  general  Zeitgeist, 
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coupled  with  the  discovery  of  the  resolution  inference 
system  (Robinson  1965,  1968),  encouraged  investigators  to 
look  for  methods  which  were  far  removed  from  anything  an 
actual  logician  would  do  to  prove  some  alleged  theorem. 

Since  the  system  had  a  single  inference  rule  (and  associated 
substitution  mechanism),  it  was  felt  that  there  was  a  sense 
in  which  this  would  be  the  simplest  system  of  logic. 

Although  elegant  in  this  sense,  the  resolution  procedure 
also  produces  intermediate  clauses  (thus  generating 
subproblems)  at  an  exponentially  explosive  rate. 

A  straightforward  way  of  carrying  out  resolution  on  a 
set  of  clauses  S  is  to  compute  all  resolvents  on  pairs  of 
clauses  in  S,  add  these  resolvents  to  S,  compute  all  further 
resolvents,  add  them  to  S,  etc.,  until  either  the  null 
clause  appears  or  there  are  no  further  resolvents.  That  is, 
generate  sequences  S0,  S, ,  S2,...  (levels  of  resolution) 
where 

So  =  S 

Sn  =  {resolvents  of  C,  &  C2|  C, £ (S0  u •  •  ■ uS^. , ) ,  CjcS^ . , } 
This  procedure  is  the  1 evel -saturat i on  method .  Chang  &  Lee 
(1973:  92-93)  carry  out  a  simple  example  where  S  =  {p+q, 

-'p+q  t  p+  ->q  r  -,p+  -,q }  and  show  that  even  in  this  simple  case  34 
further  clauses  (from  two  new  levels)  are  generated  before  0 
is  reached.  Inspection  of  these  new  clauses  reveals  that 
there  are  a  large  number  of  superfluous  clauses  falling  into 
three  types.  First,  some  of  these  clauses  are  tautologies. 
These  cannot  lead  to  @  since  tautologies  are  true  under  any 
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interpretation:  if  S  is  unsat i sf iable ,  then  the  result  of 
deleting  a  tautology  from  S  will  still  be  unsat i sf iable . 
(Simple  deletion  is  not  quite  all  there  is  to  the  issue, 
however.  If  S  contains  only  tautologies,  one  would  not  wish 
to  delete  them  all  and  be  left  with  0! )  Second,  a  number  of 
the  clauses  produced  are  identical  with  already-produced 
clauses.  Clearly,  these  should  be  deleted.  Thirdly,  and 
including  the  second  as  a  special  case,  some  clauses  are 
entailed  by  an  already-present  clause.  Again  these  should  be 
deleted.  For  example,  if  P(x)  is  already  present,  then 
(P(a)+Q(a))  should  be  deleted,  since  P(x)  subsumes  (entails) 
this  clause  by  the  substitution  of  a  for  x.  The  deletion 
Strategy  is  the  deletion  of  any  tautology  and  any  subsumed 
clause.11  Employing  this  strategy,  Chang  &  Lee  (1973:  94-95) 
show  that  the  above  example  produces  only  five  new  clauses 
before  encountering  0  as  the  first  clause  at  the  second 
level.  A  special  case  of  this  is  to  delete  subsumed  literals 
within  a  clause  (unify).  Such  a  strategy  might  usefully  be 
called  a  heuristic  in  the  sense  that  it  applies 
organizational  control  to  the  level-saturation  method.  Every 
theorem  prover  surveyed  in  the  recent  literature  employs 
some  such  heuristic.12  But  even  with  this  (admittedly 
primitive)  heuristic,  it  has  been  found  that  clauses  are 
produced  at  too  rapid  a  pace  to  make  proofs  of  even 

’’This  is  actually  a  version  of  the  Davis  &  Putnam  (1960) 
"Tautology  Rule". 

1 2With  the  exception  of  Boyer's  (1971)  "lock  resolution" 
method,  which  will  be  discussed  in  the  next  section  and 
again  in  Chapter  IV. 
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moderately  simple  theorems  computationally  feasible.  And 
this  problem  led  investigators  to  look  at  the  conditions 
under  which  the  production  of  intermediate  clauses  would  be 
kept  to  a  minimum:  i.e.,  to  more  powerful  heuristics.  It  is 
to  these  strategies  that  we  now  turn. 

D.  The  Re-emergence  of  Heuristics 

Linear  resolution  (proposed  independently  by  Loveland 

(1970)  and  Luckham  (1970))  is  another  overall  way  to 

organize  individual  resolutions,  akin  to  the  level 

saturation  method  of  the  last  section.13  The  general  idea  is 

to  pick  a  starting  resolution,  and  from  there  on  make 

resolutions  that  use  the  most  recent  resolvent.  The 

advantage  of  this  organization  is  in  its  clarity  of 

presentation  and  ease  of  stating  which  (class  of) 

resolution ( s )  to  perform  next.  It  is  perhaps  surprising  that 

linear  resolution  is  complete,  that  is,  if  S  is  an 

unsat i sf iable  set  of  clauses  then  there  is  some  series  of 

resolutions  such  that  each  one  operates  on  the  resolvent  of 

the  previous  resolution  and  the  last  clause  is  @.  This  is 

surprising  because  one  of f hand  thinks  that  the r e  must  be 

cases  where  two  "independent1'  series  of  resolutions  are 

needed  in  order  to  generate  clauses  where,  finally,  there  is 

a  resolution  which  "brings  the  two  independent  series 

together."  Such  a  situation  happens  often  in  natural 

deduction  and  axiomatic  systems.  The  reason  this  is  not  so 

1 3Thi s  entire  section  benefits  considerably  from  Loveland 
(1978)  and  Chang  &  Lee  (1973). 
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in  resolution  systems  has  to  do  with  the  fact  that  in  any 
such  apparent  case,  so  long  as  the  two  alleged  "independent" 
series  eventually  can  be  "brought  together",  there  must  be 
some  atom  in  one  which  occurs  negated  in  the  other.  But  then 
this  pair  of  complementary  literals  must  have  occurred  in 
the  formulae  which  began  each  of  the  allegedly  independent 
series;  and  in  such  a  case,  there  is  a  different  derivation 
which  will  start  with  them  and  operate  in  accordance  with 
linear  resolution.  However,  a  word  of  deflation  is  in  order 
for  anyone  who  sees  linear  resolution  as  a  solution  for  how 
to  organize  a  resolution  procedure.  The  completeness  theorem 
says  only  that  there  is  some  linear  resolution  available 
i t  does  not  say  how  to  find  it.  Thus  suppose  S  =  {(R+T+U), 
(P+Q),  -'P,  "’Ql  .  The  set  S  is  unsat  i  sf  iable ,  but  no  linear 
resolution  starting  with  (R+T+U)  will  succeed.  Clearly  what 
is  needed  here  is  some  strategy  to  tell  where  to  start  the 
linear  resolution,  for  instance  here  we  note  that  (R+T+U)  is 
not  resolvable  with  anything.  But  such  a  strategy  is  not 
enough,  since  the  breakdown  might  occur  anywhere.  A 
"second-level  breakdown"  occurs  if  -,T  is  added  to  S  and  a 
"third-level  breakdown"  occurs  if  both  ->T  and  -U  are  added. 

One  straightforward  strategy  would  be  to  delete  any 
clause  containing  an  atom  that  does  not  occur  in  another 
clause.  Obviously  such  a  clause  can  never  lead  to  [x]  because 
that  literal  can  never  be  resolved  away.  A  slight 
generalization  of  this  would  be  to  delete  clauses  containing 
a  literal  which  is  not  complemented  in  another  clause.  (This 
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is  Davis  &  Putnam’s  (1960)  ’’pure  literal  rule".)  If  this 
strategy  is  carried  out  iteratively,  one  arrives  at  a  system 
called  graph  resolution . 

Kowalski  (1974,  1978)  provides  the  most  fully  described 
"connection  graph"  theorem  prover.  The  idea  behind  a 
connection  graph  strategy  is  first,  prior  to  trying  to  prove 
anything  by  resolution,  to  lay  out  the  possible  "resolvings 
out"  as  a  graph.  Thus  suppose  our  clauses  are 
PI  +  P2  +  P3 
-P3  +  P4 
"*P4  +  --P1 

We  draw  connections  between  the  literals  which  might  resolve 
out ,  thus 


Any  clause  which  contains  a  literal  that  is  unconnected 
cannot  possibly  lead  to  the  derivation  of  the  null  clause, 
and  so  is  deleted,  along  with  any  of  its  connections.  So  in 
the  above  example  we  delete  the  first  clause  and  its 
connections,  leaving 

-,P3  +  P4 

-.p4  +  ->pi 

But  these  also  now  have  unconnected  literals  and  thus  are  to 
be  deleted.  If  there  were  no  other  clauses,  we  would  know 
prior  to  starting  the  resolution  portion  of  an  attempted 
proof,  that  it  is  not  a  theorem  and  hence  not  to  start.  Now 


, 
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consider  a  slightly  more  complicated  case.  Suppose  we  have 
these  clauses  (amongst  others  --  their  connections  to  the 
ones  listed  are  indicated  by  lines  to  nowhere) 


(The  literals  of  the  new  clause  introduced  as  a  conclusion 
are  connected  to  whatever  P3  and  P4  were  connected  to  in  the 
premise  clause;  and  a  connection,  once  resolved  upon,  is 
deleted) .  We  now  recheck  for  clauses  containing  unconnected 
literals.  Using  this  technique,  we  can  manage  to  cut  down  on 
many  of  the  unwanted  resolutions  that  would  crop  up. 

One  of  the  obvious  strategies  one  might  think  of  is 

called  unit  preference ,  and  was  introduced  by  Wos  et  al 

( 1965) . 1 4  According  to  Loveland  (1978:  98)  it  "is  still  not 

only  the  best  known  strategy,  but... is  one  of  the  best 

refinements  known  to  date."  In  order  to  deduce  0  from  a  set 

of  clauses,  one  must  obtain  successively  shorter  and  shorter 

clauses.  Unit  resolution  provides  such  a  mechanism.  The  idea 

is  to  perform  resolution  operations  where  at  least  one 

parent  clause  is  a  unit  clause  (literal)  before  it  would 

normally  be  performed  under  the  search  plan  being  employed. 

For  example,  if  level  saturation  resolution  is  being 

employed,  then  prior  to  doing  al  7  the  resolutions  at  a  given 

level,  first  do  the  unit  resolutions  at  that  level  and  the 

1 4 Actually  it  is  the  "one  literal  rule"  of  Davis  &  Putnam 
(1960). 
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unit  resolutions  involving  the  new  clauses  thus  generated, 
etc.  Usually  there  is  some  mechanism  to  prevent  creation  of 
an  infinite  sequence  of  unit  resolutions.15  For  example 
Pf(x)  and  ( “,P  ( x ) +Pf  ( x )  )  unit  resolve  to  obtain  the  unit 
clause  Pf(f(x))  which  starts  such  an  infinite  sequence. 

Another  obvious  strategy  to  employ  is  input  resolution. 
This  is  to  make  at  least  one  parent  clause  of  each 
resolution  be  an  input  clause,  i.e.,  from  S0.  The  special 
appeal  here  is  that  the  number  of  intermediate  clauses  will 
be  kept  small  by  this  strategy,  as  an  informal  consideration 
of  examples  will  show.  Chang  (1970)  showed  that  unit 
resolution  and  input  resolution  are  equivalent:  there  is  a 
unit  refutation  from  a  set  of  clauses  S  if  and  only  if  there 
is  an  input  refutation  from  S. 

Unit  refutation  is  not  complete:  not  every 
unsat i sf iable  set  will  generate  @  using  only  unit  clauses, 
e.g.,  the  example  S  =  {(P+Q),  ("■P+Q),  (P+-1Q),  (^P+_,Q)} 
obviously  has  no  unit  refutation.  Hence  by  Chang’s  theorem, 
it  has  no  input  refutation  either.  This  raises  the  question 
of  what  is  the  class  of  unsat i sf iable  sets  that  do  have  a 
unit  or  input  refutation?  The  answer  is  Horn  sets .  A  Horn 
formula  is  a  clause  which  has  at  most  one  positive 
(unnegated)  literal,  and  a  Horn  set  is  a  set  of  Horn 
formulae.  This  is  an  important  set  of  formulae  because  many 
natural  language  "problems"  can  be  stated  as  Horn 


1 5Loveland  1978:  99  says  to  "arbitrarily  choose"  an  upper 
bound  on  levels. 
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formulae.16  Even  some  other  statements  can  be  converted  to 
Horn  formulae  by  "consistent  renaming  of  predicates"  by 
their  negations  (see  Meltzer  1966).  If  one  clause  has  two 
positive  literals,  yet  renaming  is  impossible,  this  formula 
can  be  converted  into  two  Horn  formulae  by  a  "splitting 
rule",  since  the  following  is  a  tautology: 

(S  &  (A1+A2+C)  (S&(A1+C))  +  (S&  (A2+C)  ) 

where:  S  is  the  conjunction  of  the  other  clauses  of  the  set, 

( A , +A 2  +C )  is  the  formula  with  two  positive  literals  (A,  and 

A 2 )  and  a  disjunction  of  negative  literals  C.  If  the  set  on 

the  left  side  of  is  unsat  i  sf  iable ,  then  each  of  the 

disjuncts  on  the  right  side  (each  disjunct  is  not  a  Horn 

set)  has  a  unit  (or  input)  refutation.  However,  there  may  be 

shared  variables  between  A,,  A2,  and  C;  and  that  means  that 

it  is  possible  that  the  variable  instantiations  used  in 

showing  (S&(A,+C))  unsat i sf iable  and  showing  (S&(A2+C)) 

unsat i sf iable  are  incompatible  and  so  there  is  no  single 

instantiation  which  shows  ( S& ( A , +A 2  +C ) )  is  unsat i sf iable . 1 7 

What  is  needed  is  some  method  of  showing  the  interaction  of 

variables  in  the  two  sub-refutations.  Typically,  one 

processes  one  of  the  Horn  formulae,  and  retains  all  the 

instantiations  of  variables  in  the  second  formula  by  the 

appropriate  instantiation.  (The  method  used  is  the  same  as 

1  6The  clause  form  of  Horn  formulae  is  ( ->0  ,  +"10  2  +  •  •  .  +  0  )  /  which 
is  equivalent  to  (  0  , -►  ( 0  2-* .  .  .->-0  )  )  and  to  (  (  0  , &0  2  &  .  .  .  ) ->0  )  ,  a 
very  common  natural  language  form,  corresponding  as  it  does 
to  a  GPS-style  problem  setting  and  to  natural  language 
universal  statements. 

1 7That  is,  just  because  each  of  S&(A,+C)  and  S&(A2+C)  has  a 
refutation,  it  doesn’t  follow  that  S&(A1+A2+C)  does. 
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"answer  extraction"  discussed  in  Chapter  I). 

Various  other  splitting  techniques  are  available,  but  I 
shall  not  discuss  them  further.  See  Chang  &  Lee  (1973), 
Henschen  &  Wos  (1974),  and  Nevins  (1974),  all  of  which  are 
refinements  of  Davis  &  Putnam  (1960). 

A  "generalization"  of  the  unit  preference  strategy, 
called  the  fewest  literal  preference  strategy,  was  proposed 
by  Slagle  (1965).  One  does  a  "modified  depth  first"  method 
where,  when  a  pair  of  formulae  are  chosen  to  be  resolved 
upon  one  continues  to  resolve  upon  that  resolvent,  its 
resolvents,  etc.,  until  some  specified  threshold  number  of 
levels  has  been  reached.  When  this  happens,  pick  another 
pair  of  formulae  to  be  resolved  upon  in  this  manner,  etc. 

(Of  course,  if  0  is  derived  at  any  step,  terminate  with  a 
proof).  At  each  stage  one  wishes  to  process  the  "likely 
looking"  candidates  before  the  "bad"  candidates.  Since  we 
are  trying  to  reach  @,  the  "most  likely"  candidates  will  be 
ones  that  result  in  the  smallest  resolvent  (smallest  in  the 
sense  of  having  fewest  literals).  A  measure  of  this  is  the 
sum  of  the  lengths  of  the  parents.  Hence,  do  resolution  in 
this  order.  Obviously  the  method  is  complete,  since 
eventually  all  resolvents  will  be  computed. 

Using  length  of  parent  clauses  is  only  one  of  a  number 
of  possible  measures  of  "goodness"  of  a  proposed  resolution. 
Among  the  other  possibilities  mentioned  by  Chang  &  Lee 
(1973:  154)  for  judging  whether  to  resolve  C  with  B  are 
1.  The  number  of  literals  in  C 
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2. 

The 

number 

of 

clauses 

which  can 

resolve  with  C 

3. 

The 

number 

of 

constants  in  C 

4. 

The 

number 

of 

function 

symbols  in  C 

5. 

The 

number 

of 

distinct 

variables 

in  B  and  C 

6. 

The 

number 

of 

constants  in  C  v  (1+the  number  of 

variables 

in 

C) 

7. 

The 

number 

of 

literals 

in  both  B 

and  C 

8. 

The 

number 

of 

distinct 

predicate 

letters  in  B  and  C 

9. 

The 

length 

of 

C  +  the 

length  of 

B 

Using  standard  methods  from  the  analysis  of  variance,  one 
estimates  how  "good"  it  is  to  resolve  C  against  B  by 
weighting  each  of  the  possibilities  and  summing  them.  Thus 
letting  f,,...f,  stand  for  the  nine  listed  features  we  might 
find  important,  the  estimate  h(C,B)  of  how  good  the 
resolution  of  C  against  B  is,  will  be  given  by 
h  ( C  ,  B  )  =  w0  +  w,f ,  ( C , B )  +  ...  +  w , f  9 ( C , B ) 

If  we  know  a  number  of  specific  values  of  h+(C,B)  --  how 
"good"  in  a  number  of  different  cases  C  against  B  resolution 
really  turned  out  to  be  --  we  can  use  standard  regression 
analysis  techniques  to  get  values  for  w0,  w,,  ...  w9.  This 
then  raises  the  question,  how  do  we  know  if  a  set  of 
features  is  good?  If  the  set  is  good,  then  after  a  number  of 
examples  used  to  obtain  the  iv f  s  so  as  to  get  h,  it  should  be 
useful  in  new  cases.  If  it  does  not  apply  well  to  new  cases, 
new  features  should  be  found.  h(C,B)  is  called  the  heuristic 
evaluation  function ;  see  Slagle  &  Farrell  (1971)  for  further 
discussion  and  examples. 
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One  of  the  first  and  still  most  popular  strategies  is 
the  set  of  support  strategy  of  Wos,  Robinson  &  Carson 
(1965).  Recall  that  a  deduction  of  C  from  premises  P, , 

P2  ,  .  .  .  amounts  to  showing  that  {P,  ,  P2  ,  _,C]  is 

unsat isf iable .  Since  {Pi ,  P2  ...}  is  generally  satisfiable, 
it  is  perhaps  wise  to  avoid  resolving  their  clauses.  This  is 
what  the  set  of  support  strategy  tries  to  accomplish.  Given 
a  set  S  of  clauses,  a  subset  T  of  S  is  called  a  set  of 
support  if  (S-T)  is  satisfiable.  A  set  of  support  resolution 
is  a  resolution  such  that  not  both  parents  come  from  (S-T), 
and  a  set  of  support  deduction  is  one  where  every  resolution 
is  a  set  of  support  resolution.  It  is  quite  straightforward 
to  show  that  this  strategy  is  complete:  if  S  is  a  (finite) 
set  of  unsat i sf iable  clauses  and  T  a  subset  of  S  such  that 
(S-T)  is  satisfiable,  then  there  is  a  set  of  support 
deduction  of  0  from  S  with  T  as  the  set  of  support. 

Obviously  there  can  be  more  than  one  set  of  support  for  an 
unsat isf iable  S.  In  Chapter  IV  I  consider  some  of  the 
problems  this  observation  raises  for  implementing  set  of 
support  strategies  in  a  reasonable  way. 

Hyperresol ut ion  was  introduced  by  Robinson  (1965).  The 
idea  behind  it  is  that  one  can  specify  a  setting'  B  to  aid  in 
the  choice  of  which  formulae  to  resolve  against  each  other. 

A  setting  is  intended  to  be  a  way  of  dividing  all  clauses 
into  two  classes:  those  true  in  the  setting  vs.  those  false 

1 80r  interpretation,  model,  partition,  partial 
interpretation,  etc.  The  literature  is  rife  with 
terminology.  I  here  follow  Loveland  (1978:  116). 
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in  the  setting.  This  is  accomplished  by  making  the  setting 
be  a  consistent  list  of  literals  which  occur  in  the  clauses. 
(Since  the  set  S  was  unsat i sf iable ,  no  setting  can  satisfy 
every  member  of  S  nor  falsify  every  member  of  S;  thus  S  is 
partitioned  into  two  non-empty  subsets).  A  resolution  is 
performed  only  when  the  parents  each  come  from  different 
sides  of  the  partition,  and  the  resolvent  is  then  placed  in 
the  appropriate  side.  Of  special  interest  are  the  two 
settings : 

{A | A  is  an  atom  in  S} 

{"’AjA  is  an  atom  in  S] 

called  the  positive  and  negative  setting,  respectively. 
Resolution  using  the  negative  setting  is  hyper resolut ion , 
that  using  the  positive  setting  is  called  P , -resolut i on . 1 ’ 
Chang  &  Lee  (1973:  108)  point  out  that  in  a  very  common 
case,  the  premises  of  an  argument  are  represented  by  either 
positive  or  mixed  clauses,  while  the  negation  of  the 
conclusion  is  represented  by  a  negative  clause.  In  this 
case,  hyper resolut ion  is  best  viewed  as  "working  forward" 
(from  the  premises)  to  deduce  the  original  conclusion  and 
thence  resolve  against  its  negation;  while  P , -resolut i on  is 
best  viewed  as  as  "working  backward"  from  the  negation  of 
the  conclusion  to  0.  It  should  be  noted  here  also  that 
"consistent  renaming  of  predicates"  involving  uniform 

1 ’Mysteriously ,  Chang  &  Lee  (1973:  108)  call  the  former 
positive  hyperresolut ion  and  the  latter  negative 
hyper resolut i on .  Their  reason  is  that  when  the  strategies 
are  used,  all  resolvents  of  the  former  will  be  negation-free 
while  in  the  latter  they  will  be  completely  negative. 
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replacement  of  predicates  by  their  negations  (Meltzer  1966) 
can  aid  in  finding  reasonable  settings.  Meltzer  considers 
some  cases  involving  the  number  of  positive  and  negative 
literals  in  a  set  of  clauses  S  where  this  information  is 
used  to  find  good  settings.  (Meltzer  calls  this 
Pp-resolut ion ) . 

The  fact  is  that  the  set  of  support  strategy,  the 
P i~ resolution  strategy,  the  Pp-resolut ion  strategy,  and  the 
hyper resolut ion  strategy  are  all  special  cases  of  a  more 
general  strategy:  semantic  resolution  (Chang  &  Lee  1973, 
Chapter  6,  who  attribute  it  to  Slagle  1967).  It  should  be 
obvious  that  P , -resolut ion ,  Pp-resolut ion ,  and 
hyper resolut ion  differ  only  in  what  they  choose  as  the 
setting.  In  fact  semantic  resolution  amounts  only  to  saying 
that  some  setting  should  be  chosen  and  used  to  divide  the 
clauses  into  two  disjoint  sets.  Clearly  the  present  types  of 
resolution  merely  offer  advice  on  how  to  choose  a  setting. 
The  same  is  true  with  the  set  of  support  strategy.  One 
constructs  a  setting,  usually  using  the  negation  of  the 
conclusion,  and  constructs  the  two  sets  C  (the  set  of 
support)  and  (S-C).  All  resolutions  are  to  come  from  parents 
in  different  sets.  When  a  resolvent  is  found  it  goes  into 
the  (S-C)  set  and  will  be  resolved  against  a  member  of  the  C 
set.  Obviously  this  is  again  an  example  of  the  general 
semantic  resolution.  Semantic  resolution  is  complete  in  the 
following  sense:  if  S  is  unsat i sf iable  and  I  is  any 
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setting20  then  I  partitions  S  into  C  and  (S-C)  and  there  is 
a  deduction  of  0  from  S  such  that  each  resolution  has  a 
parent  from  each  of  C  and  (S~C).  From  this  completeness 
theorem,  the  completeness  of  any  of  the  special  cases 
follows . 

One  final  strategy  should  be  mentioned  here,  as  it  is 
common  in  the  literature  and  takes  up  considerable  space  in 
textbook  presentations  such  as  Chang  &  Lee  (1973)  and 
Loveland  (1978),  and  that  is  ordering.  Suppose  we 
arbitrarily  give  an  ordering  to  the  predicate  symbols 
occurring  in  a  set  of  clauses  S,  and  require  that  when  two 
clauses  are  resolved,  we  always  resolve  upon  the  largest 
literal.  Thus  for  example,  if  we  have  an  ordering  P>Q>R  and 
the  two  clauses  (P+R)  and  ~’R,  we  would  not  be  allowed  to 
resolve  them  because  R  is  not  the  largest  literal  in  the  two 
clauses.  One  can  add  this  sort  of  order-of -predicate 
strategy  to  either  linear  or  semantic  organizations  of 
resolution  and  the  resulting  system  is  complete:  from  an 
unsat i sf iable  set  of  clauses  there  will  be  the  appropriate 
(linear,  semantic)  ordered  deduction  of  0.  Nonetheless,  as 
mentioned  before,  such  information  does  not  uniquely 
determine  which  resolution  to  start  with  to  even  get  a 
deduction  of  0,  much  less  an  optimal  one.  All  we  know  is 
that  there  is  one,  not  where  to  start  or  how  to  proceed  in 
order  to  find  it. 


2  °Recall  that  settings  must  be  consistent. 
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There  is  another  refinement  of  ordering  that  should  be 
mentioned,  and  that  is  to  order  individual  clauses  rather 
than  just  the  predicates  as  a  whole.  This  refinement  is 
carried  out  differently  in  the  semantic  and  linear  systems. 
To  order  a  clause,  one  merely  uses  the  order  of  occurrence 
( lef t-to-r ight )  of  the  predicates  in  the  clause.  (One  could 
choose  some  other  order,  but  this  one  is  easy  and 
illustrates  all  the  points.)  Now  for  some  technical  details. 
If  two  or  more  literals  of  an  ordered  clause  C  (with  the 
same  sign)  have  a  most  general  unifier,  substitute  the 
unifier  uniformly  and  delete  all  larger  occurrences  of  the 
literal  (i.e.,  those  later  in  the  clause)  to  obtain  the 
ordered  factor  of  C.  If  C,  and  C2  are  ordered  clauses  with 
no  variables  in  common,  and  Lt  and  L2  are  literals  in  Ct  and 
C2  respectively,  and  L,  and  -,L2  have  a  most  general  unifier 
S,  and  if  C  is  the  ordered  clause  resulting  by  disjoining 
C ,s  and  C2S,  removing  L ,s  and  L2s  and  deleting  any  literal 
identical  to  a  smaller  (in  C)  literal,  then  C  is  an  ordered 
binary  resolvent  of  C,  against  C2.  (Note  that  this  notion  is 
not  symmetric).  Now,  an  ordered  resolvent  of  C:  against  C2 
occurs  when:  (a)  C^  and  C2  are  both  ordered  clauses,  (b)  C 
is  an  ordered  binary  resolvent  of  X  against  Y  where  X  (and 
Y)  is  either  C,  (or  C2)  or  an  ordered  factor  of  C,  (or  C2). 
Now  suppose  I  is  a  setting.  An  ordered  semantic  clash  with 
respect  to  I  occurs  when  (a)  there  is  a  sequence  of  ordered 
clauses  C1f  C2,  ...,  Cn  (b)  C1f  C2,  ...,  Cn  are  all  false  in 
I  (c)  For  each  i=1,...n  there  is  an  ordered  resolvent  Ri+1 
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of  Ci  against  Ri  (d)  The  literal  in  Ci  that  is  resolved  upon 
is  the  last  literal  in  Ci,  and  the  literal  resolved  upon  in 
Ri  is  the  largest  literal  that  has  an  instance  true  in  I, 
and  (e)  R^+ i  is  false  in  I.  An  example  would  perhaps  help. 
Chang  &  Lee  (1973:  115)  give  this:  Let  S  be  the  set  of 
ordered  clauses  {  (Q(a)+R(x)  )  ,  (-,Q(x)+R(x)  )  ,  ( -R ( x ) +-S (a ) ) , 
S(x)}.  Let  I  be  an  interpretation  where  every  literal  is 
negative.  Then  the  following  is  an  ordered  semantic 
deduction . 

(Q(a)+R(x))  S(x)  ( _,R(a  ) +_,S  (a  )  )  yields  Q  ( a  ) 

Q  ( a )  (“,Q(x)+R(x)  )  yields  R(a) 

R (a )  S ( x )  (-R(a)+-S(a) )  yields  @ 

Slagle  and  Norton  (1971)  experimented  with  ordered 
clause,  semantic  resolution.  Unfortunately,  it  is  not 
complete.  The  following  set  of  formulae  are  inconsistent  but 
there  is  no  ordered  clause,  semantic  refutation  of  them: 
{(P+Q),  (Q+R),  (R+W),  (-R+-P),  (-W+-Q),  (-Q+-R)}.21 

A  somewhat  different  refinement  of  ordered  clause, 
semantic  resolution  is  Boyer's  (1971)  lock  resolution  where 
each  occurrence  of  a  literal  is  assigned  an  integer  index; 
different  occurrences  of  the  same  literal  in  the  set  of 
clauses  may  be  indexed  differently.  Resolution  is  then 
permitted  only  on  literals  of  the  lowest  index  in  each 
clause.  The  literals  in  the  resolvents  inherit  their  indices 
from  their  parent  clauses;  if  there  is  more  than  one 
possible  index  assigned  this  way,  it  is  given  the  lowest 


2 1  The  example  is  due  to  Anderson  (1971). 
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index.  This  kind  of  ordered  clause,  semantic  resolution  is 
complete:  there  is  some  lock  resolution  of  E  from  S,  if  S  is 
unsatisf iable .  But  there  are  other  problems  with  it  which 
will  be  discussed  in  Chapter  IV. 

It  was  mentioned  before  that  input  resolution  is  not 
complete:  there  are  proofs  in  which  some  resolutions  have  to 
be  made  between  two  clauses  which  are  each  themselves 
resolvents  of  other  clauses.  It  would  be  nice  to  be  able  to 
identify  such  resolvents.  Loveland  (1968)  introduced  the 
model  elimination  strategy  for  this.  Basically  the  idea  is 
to  record  which  literals  have  been  resolved  upon  and 
therefore  deleted;  in  fact,  if  we  have  determined  that  we 
must  use  some  previously  generated  resolvent  --  we  need  not 
even  remember  which  one  --  we  merely  perform  some  operation 
on  the  clause  to  obtain  a  new  clause.  When  two  (ordered) 
clauses  are  resolved  against  one  another,  we  keep  the 
(unnegated)  literal  resolved  upon  in  a  special  format,  say 
as  italicized.  Such  a  literal  is  said  to  be  framed.  Thus  in 
(P+Q)  and  (-■Q+R),  both  treated  as  ordered  clauses  where  the 
first  is  resolved  against  the  second,  yields  (P+(?+R).  Some 
details  now  are  that  a  framed  literal  which  does  not  precede 
any  unframed  literals  is  deleted,  that  for  multiple  copies 
of  the  same  framed  literal  we  delete  all  but  the  left-most 
one,  and  that  whenever  an  unframed  literal  of  a  clause  is 
complementary  to  a  framed  literal,  then  frame  that 
complementary  literal.  (Intuitively,  this  last  occurs  when 
some  previously-generated  resolvent  will  be  needed  to 
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resolve  against  the  current  one.  Here  we  do  not  remember 
what  one  it  was,  only  that  there  had  to  be  one  such.)  So  as 
an  example  (taken  from  Chang  &  Lee  1973:  137-138),  consider 
S  to  consist  of  the  ordered  clauses  {(P+Q),  (P+-,Q),  ("'P+Q), 

( -iP+-,Q ) }  .  Starting  with  (P+Q),  the  last  literal  of  this 
ordered  clause  is  Q,  which  can  be  resolved  against  (P+_,Q). 
So,  recording  the  relevant  information,  we  get  (P+P+Q) ;  but 
then  we  delete  duplicate  occurrences  of  P,  and  note  that  Q 
is  not  followed  by  any  unframed  literal  and  so  is  deleted. 
Thus  we  get  P.  Its  last  literal  is  P,  which  can  be  resolved 
against  (~,P+Q).  The  answer  yielded  here  is  (P+Q).  Its  last 
literal  is  Q,  which  can  be  resolved  against  (",P+“'Q).  The 
answer  obtained  is  (P+Q+-' P).  We  now  note  that  the  last 
literal  of  this  formula  is  the  negation  of  one  of  the  framed 
ones.  We  therefore  frame  it  also,  yielding  (P+Q+^P) .  But  now 
there  are  no  unframed  literals  after  the  framed  ones,  so  we 
delete  the  framed  literals,  yielding  @.  it  can  be  shown  that 
if  S  is  unsat isf iable ,  then  there  is  a  linear  resolution 
using  the  model  elimination  ordering  strategy  which  will 
generate  @.  One  might  also  note  that  set  of  support  is  a 
special  case  of  this  strategy. 

E.  The  Retreat  from  Resolution,  I 

The  first  thing  one  notes  about  resolution  is  the 
completely  unnatural  format  required,  clausal  form.  With  the 
exception  of  those  who  write  automatic  theorem  provers  (and 
those  people  only  while  they  are  writing) ,  no  one  uses 
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clause  form  to  represent  problems  whether  they  be  in 
mathematics,  program  specification,  or  even  pure  logic.  Thus 
there  is  perceived  a  need  to  construct  a  resolution-based 
system  which  allows  analogues  of  the  resolution  rule  to 
operate  on  arbitrary  formulae  of  first  order  logic.  So  there 
has  been  a  search  for  a  system  with  a  single  inference  rule 
that  can  be  used  like  this.  Murray  (1982)  proposed  what  he 
calls  NC-resolut ion ,  a  procedure  which  is  extraordinarily 
complex  and  involves  reducing  truth  functional  expressions 
to  simpler  ("reduced")  ones,  replacing  sub-formulae  by  truth 
values,  and  finally  unifying  the  formulae  (after  checking 
their  "polarity"  --  the  number  and  position  of  negations  in 
the  subformulae)  to  arrive  at  a  disjunction  of  the  reduced 
formula.  A  proof  is  complete  when  F  ("the  false  formula")  is 
reached.  The  actual  method  used  after  the  reduction  is  a 
"semantic  resolution"  of  the  sort  described  in  the  last 
section . 

The  1970's  saw  more  and  more  moves  away  from  the  pure 
resolution  systems.  A  number  of  these  newer  approaches 
attempted  to  incorporate  natural  deduction  techniques  to 
divide  "hard"  problems  up  into  several  easier  ones  and  then 
turn  these  easier  ones  over  to  a  resolution  prover.  The 
Davis-Putnam  (1960)  "split"  rule,  discussed  in  the  last 
section  with  reference  to  Horn  clauses,  is  a  primitive 
version  of  such  a  heuristic  strategy.  Two  works  should  be 
singled  out  here.  Nevins  (1974)  starts  with  an  unsat i s f iable 
set  S  of  formulae  and  removes  formulae  from  it  one  at  a  time 
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(complex  formulae  first),  breaking  them  down  by  rules  akin 
to  Jeffrey's  (1967)  rules  of  Chapter  II,  and  placing  the 
results  into  set  S,  .  Thus  if  _,(P-*Q)  is  in  S,  it  will  get 
removed  and  the  formulae  P  and  _,Q  will  be  put  into  S,  ;  if  _iQ 
is  in  S,  and  (P-»Q)  is  removed  from  S,  then  (P->Q)  and  -’P  are 
added  to  S, .  When  S,  contains  a  formula  and  its  negation, 
the  proof  is  finished.  Disjunctions  are  handled  differently 
by  being  split  into  subproblems.  If  (P+Q+R+...)  is  in  S  then 
(a)  if  no  variable  in  P  subsumes  any  of  Q,  R, . . . ,  make  a 
copy  of  S,  and  add  P  to  it  and  start  again.  If  successful, 
do  the  same  with  Q,  then  R,  etc.  If  all  of  them  yield  a 
proof,  then  the  set  S  is  unsat i sf iable .  (b)  If  P  does 
subsume  one  of  the  other  formulae  in  the  disjunction,  a  more 
complex  plan  is  required.  The  mechanism  proposed  (pp. 
609-611)  attempts  first  to  find  other  splits,  and  failing 
that  attempts  to  keep  track  of  all  the  variables  as  if  they 
"depended  upon"  the  disjunction  being  split.  Nevins 
considers  the  effect  of  this  method  when  another  such 
disjunctive  split  is  required  and  there  are  unifiers  within 
both  disjunctions  and  between  the  disjunctions.  So  far  as  I 
can  see,  his  method  cannot  work  in  general.  Nevins  does  not 
consider  the  method  discussed  in  the  last  section  concerning 
splitting  "almost"  Horn  formulae. 

Bledsoe  (1971)  describes  a  similar  system.  In  this 
system  a  number  of  attempts  are  made  to  simplify  and  break 
up  the  to-be-proved  theorem  before  the  results  are  sent  on 
to  a  resolution  subroutine.  Many  of  these  correspond  to  the 
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tactics  one  would  use  in  Kalish  &  Montague's  system,  e.g., 
if  one  wishes  to  show  that  (P&Q)  was  a  theorem,  one  would 
spearately  show  that  P  was  a  theorem  and  then  that  Q  was. 
Similarly,  to  show  that  (P«-*Q)  is  a  theorem,  one  shows  that 
each  of  (P-»Q)  and  (Q-»P)  is  a  theorem.  Bledsoe's  system  even 
uses  some  strategies  not  immediately  available  in  Kalish  & 
Montague's  system,22  for  instance  to  prove  (  (P+Q)->R)  prove 
first  each  of  (P->R)  and  (Q^R)  .  It  should  be  noted  that  these 
splitting  strategies  actually  only  apply  to  the  theorem  to 
be  proved.  Once  the  theorem  has  been  "split"  into  the 

4 

(hopefully)  simpler  subproblems,  the  resolution  routine  is 
called  on  them.  A  "tight  timelimit"  is  kept  on  resolution, 
and  if  the  proof  is  not  forthcoming  within  the  timelimit,  a 
set  of  "actions"  is  taken.  The  actions  to  be  taken  depend 
upon  the  specific  domain  under  study;  for  instance,  in  his 
(1971)  the  domain  was  set  theory,  so  various  set  theoretic 
reductions  were  employed  as  "actions"  (and  called 
"reductions").  E.g.,  in  certain  (syntactically  defined) 
formula-positions  such  as  consequents  of  conditionals,  the 
set  equality  symbol  A=B  was  replaced  by  the  inclusions  (A«=b) 
&  (B«=a),  and  sometimes  the  inclusion  (A<=b)  was  replaced  by 
set  membership  (Ax) (xeA+xeB) .  Splitting  is  done  again  and 
the  resolution  program  is  recalled. 

As  I  see  it,  this  sort  of  thing  —  non-clausal  format, 
truth  functional  breakdown,  splitting  of  the  goal,  and 

2  2They  are  not  available  because  they  do  not  correspond  to 
primitive  rules  of  inference  in  their  system.  Kalish  & 
Montague  eventually  introduce  these  as  derived  strategies. 
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domain-specific  reduction  --  is  the  first  step  away  from 
resolution  procedures. 

F.  The  Retreat  from  Resolution,  II 

Further  breaks  from  pure  resolution  were  already 
underway,  headed  by  Bledsoe  and  people  surrounding  him  (to 
judge  from  the  literature).  In  1972  Bledsoe  et  al  replaced 
the  resolve  subroutine  of  his  1971  system  (discussed  in  the 
previous  section)  by  one  called  imply.  The  authors  claimed 
to  "have  thus  eliminated  resolution  altogether  from  [their] 
program,  replacing  it  by  an  'implication  method'  which 
[they]  believe  is  faster  and  easier  to  use...".  As  in  the 
1971  system,  there  are  two  parts  to  the  system:  the  proof 
system  proper  and  a  variety  of  (domain-specific)  routines. 

In  the  1971  system  these  latter  were  called  "reductions" 

(for  set  theory),  whereas  in  the  1972  system  they  are  a  set 
of  methods  for  simplifying  and  solving  linear  equations  by 
assigning  sets  and  types  to  them.  One  of  these  methods  is 
called  the  "limit  heuristic"  —  whose  application  here  is 
strictly  restricted  to  proving  limit  theorems.  But  the 
strategy  behind  it  is  of  some  more  general  interest.  In  the 
words  of  the  authors, 

Because  the  limit  heuristic  enables  our  program  to 
prove  many  theorems  about  limits,  we  regard  it  as  a 
rather  interesting  trick.  But  more  interesting  and 
important  than  the  fact  that  it  works  some  problems 
[sic]  is  the  principle  behind  it.  That  principle 
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might  be  stated: 

To  establish  a  conclusion  C  from  several 
hypotheses,  among  which  is  H,  force  H  to 
contribute  all  it  can  towards  establishing  C  and 
leave  a  remainder  to  be  established  with  the 
help  of  other  hypotheses. 

...[l]f  one  can  truly  make  H  contribute  all  it  can 
towards  C,  then  H  is  not  needed  to  establish  the 
remainder.  That  is,  a  reduction  in  the  number  of 
hypotheses  is  achieved  while  a  significant  step  in 
the  proof  is  made. 

This  guiding  principle  can  be  more  widely  applied  than  just 
in  this  limit  heuristic,  as  the  authors  acknowledge;  indeed 
it  could  be  adapted  to  the  proof  system  in  general  --  even 
to  pure  resolution,  where  it  would  be  some  kind  of 
depth-first  search.  We  shall  return  to  the  shortcomings  of 
this  general  principle  in  the  next  chapter. 

The  proof  system  proper  of  (1972)  contains  two  parts. 
The  first  is  the  same  as  the  (1971)  system,  namely  a  set  of 
"splitting"  heuristics  which  break  the  main  theorem  to  be 
proved  into  subgoals.  The  second  is  the  imply  routine  to 
which  these  subgoals  are  now  passed  (instead  of  the  resolve 
subroutine  as  before).  This  subroutine  has  two  arguments: 
the  formula  C  to  be  proved,  and  R  (a  "reserve"  set  of 
formulae).  The  result  of  a  call  to  imply  is  either  a 
substitution  or  nil.  The  latter  indicates  failure  to 
establish  the  subgoal,  imply  attempts  to  find  and  return  the 
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most  general  substitution  s  such  that  (R^C)s  is  true.  If  s 
is  the  empty  substitution,  then  imply  returns  true. 

A  formula  C  is  converted  to  a  quantifier  free 

"skolemized"  form  following  the  method  of  Wang  (1963), 2 3  and 

then  a  call  to  (imply  C  nil)  is  made.  Various  of  the  parts 

of  imply  look  at  the  form  of  C  to  decide  what  to  do  next. 

Some  of  these  rules  say  to  solve  other  problems.  Thus  if  C 

was  of  the  form  (H-* ( A-*B )  )  ,  the  relevant  rule  recursively 

calls  (imply  (H&A-^B)  nil).  So  for  this  example  we  now  look 

to  the  relevant  rule  which  says  that  if  (imply  (H-*B)  A) 

returns  s,  ,  then  (imply  (H&A->B)  nil)  returns  s,  ,  but  if 

(imply  (A-+B)  H)  returns  s2  ,  then  (imply  (H&A-*B)  nil)  returns 

S2 .  So  we  need  to  evaluate  (imply  (H^B)  A).  Assuming  H  and  B 

to  be  positive  and  non-complex,  this  succeeds  if  either  H  is 

tautologically  equivalent  to  B  (and  then  we  return  true)  or 

else  there  is  a  most  general  unifier  of  H  and  B  (in  which 

case  we  return  it).  Such  a  simple  case  would  only  happen 

when  H  itself  directly  implied  B  without  the  need  for  A.  In 

any  more  complex  case,  say  a  case  where  H  had  the  form 

(A-*B),  we  will  need  another  call  to  imply.  So  we  are  trying 

to  evaluate  (imply  ((A-*B)+B)  A);  the  relevant  rule  is 

"backwards  chaining".  We  need  to  evaluate  (imply  (B->B)  A)  -- 

which  yields  true  --  and  (imply  (A^-A)  NIL).24 

2  3  Thi s  is  not  clause  form  since  one  does  not  convert  first 
to  prenex  normal  form. 

2  4 1 n  the  more  general  case,  the  consequent  of  this  last 
formula  will  be  the  result  of  substituting  the  most  general 
unifier  found  from  the  previous  call  to  imply.  In  the 
current  example  the  empty  unifier  was  found,  and  so  here  we 
did  no  substitution.  Strictly  speaking,  the  statement  of 
backwards  chaining  is:  (imply  ((A->B)->C)  R)  first  calls 
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This  last  also  yields  true  and  is  percolated  backwards  to 
the  first  call  to  imply. 
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A  careful  examination  of  the  rules  of  imply  (1972:  36) 
shows  that,  besides  all  the  "rewriting"  rules,  there  are  the 
following  proof  strategies.  First,  conditionals  are  proved 
when  a  most  general  unifier  of  the  antecedent  and  consequent 
can  be  found.  Second,  a  reduct io-style  proof  is  attempted  in 
the  case  where  formulae  of  the  form  (H-^C)  are  being  proved. 
This  is  done  by  changing  (imply  (H->-,C)  R)  to  (imply 
(H&Onil)  R)  .  And  finally  there  are  the  two  strategies  which 
call  the  "reserve"  R  into  play.  The  first  is  the  "backwards 
chaining"  described  above,  and  the  second  is  when  the 
formula  to  be  proved  is  (~,H-^C)  with  reserve  R  we  call  (imply 
( R-*H+C )  nil).  An  examination  of  proofs  of  any  complexity  at 
all  will  show  that  backwards  chaining  is  the  most 
commonly-attempted  strategy,  and  that  besides  the  finding  of 
a  unifier,  it  is  what  really  drives  imply.  This  raises  the 
question  of  whether  backwards  chaining  is  a  good  choice  of 
strategies  to  rely  upon  almost  exclusively.  I  discuss  this 
in  the  next  chapter. 

Bledsoe  &  Bruell  (1974)  is  an  interactive  theorem 

prover,  arranged  along  the  same  lines  as  the  Bledsoe  et  si 

(1972)  system.  Its  expert  domain  is  topology,  and  so  it  has 

a  number  of  "reduce"  style  techniques  relevant  to  this  area. 

2  4  (cont '  d)  (  imply  (B->C)  R)  .  If  this  returns  Si  it  then  calls 
(imply  (R+AS, )  NIL).  If  this  returns  S2 ,  then  the  top-level 
call  returns  (SiS2+S2)  where  neither  s,  nor  s2  are  NIL.  If 
the  second  call  failed  but  either  of  (imply  (R+As,+C)  NIL) 
or  (imply  (  (  ( A-+B) &R)+As,  )  NIL)  yields  S2  ,  then  again  the 
top-level  call  returns  (SiS2+S2). 
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While  the  main  interest  of  the  1974  system  is  in  the  variety 
of  ways  a  user  can  interrupt,  add  "axioms"  to  be  used, 
suggest  substitution  instances,  etc.,  there  are  some  parts 
of  it  relevant  to  the  current  discussion  of  theorem  proving 
strategies . 

The  main  differences  between  the  1972  and  1974  systems 
are  these.  In  the  1972  system,  the  split  routine  simplified 
the  overall  goals,  but  not  any  of  the  subgoals  which  might 
be  set  up  after  the  proof  has  started.  Similarly,  the  reduce 
routine  was  applied  after  the  imply  routine  had  failed,  and 
then  imply  was  recalled.  All  this  was  handled  by  an  overall 
monitor  called  cycle.  The  current  system  makes  imply  be  the 
overall  monitor,  and  it  calls  the  split  and  reduce  routines 
as  needed.  One  further  kind  of  strategy  was  added  to  imply: 
ground  forward  chaining.  Forward  chaining  was  omitted 
earlier  because  it  produces  a  large  number  of  intermediate 
and  useless  clauses.  A  ground  forward  chain  occurs  when  the 
expression  being  chained  off  of  contains  only  constants. 
Furthermore,  since  this  system  now  interacts  with  a  human 
user,  certain  strategies  can  now  be  relegated  exclusively  to 
the  user.  Thus,  there  was  before  a  "backtrack"  strategy  used 
when  calling  (imply  (H^(A&B))  R)  ,  which  tried  (imply  (H-*A) 

R)  and  used  the  unifier  s  thereby  found  to  try  (imply  (H+Bs) 
R).  But  if  this  last  failed,  a  backtrack  was  used  to  find 
another  unifier  s, .  In  the  current  system,  such  failures  are 
handled  by  the  human,  who  suggests  other  unifiers.  The 
current  version  now  uses  a  "breadth  first"  search  rather 
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than  the  earlier  "depth  first"  search.  And  coupled  with  this 
is  a  routine  which  tries  to  use  a  hypothesis  which  is  "like" 
the  desired  conclusion,  even  though  a  complete  match 
(unification)  cannot  be  made.  As  will  be  shown  in  the  next 
chapter,  it  is  not  obvious  that  this  is  a  wise  change.  It 
should  finally  be  added  here  that  the  portion  of  the  reduce 
strategy  which  substitutes  definitions  now  has  an  ordering 
imposed  on  the  terms  (from  "common"  to  "strange")  so  that 
"strange"  terms  get  defined  first  and  the  imply  routine 
tries  to  prove  with  this  before  attempting  to  prove  after 
defining  "common"  terms. 


IV.  METHODOLOGICAL  AND  PRACTICAL  PROBLEMS  IN  AUTOMATIC 


THEOREM  PROVING 


A.  Introduction 

As  can  be  seen  from  the  examples  cited  in  Chapter  I, 
the  different  uses  to  which  automatic  theorem  proving  can  be 
put  set  different  criteria  upon  the  way  the  automatic 
theorem  prover  is  to  be  organized.  For  example,  Polly 
Programmer's  verifier  sets  as  a  necessary  condition  that  the 
proof  checker  should  be  sound,  i.e.,  that  if  the  verifier 
says  the  program  is  correct  then  it  is  correct.  Polly  might 
also  wish  to  set  the  condition  that  the  verifier  be 
complete,  i.e.,  that  if  the  program  is  correct  then  the 
verifier  will  say  so.  So  long  as  these  conditions  are  met, 
Polly  has  no  other  cares  about  how  the  verifier  carries  out 
its  task;  natural  deduction,  resolution,  and  axiomatic 
systems  never  enter  her  mind.  Larry  Lazy,  on  the  other  hand, 
must  guarantee  that  his  automatic  program  synthesizer  will 
construct  the  relevant  proof  in  such  a  way  that  the  parts  of 
the  proof  correspond  to  standard  programming  constructs  of 
PASCAL.  He  is  therefore  not  permitted  to  use  just  any 
logical  method..  He  might  indeed  be  willing  to  give  up 
completeness  and  even  soundness  if  the  system  were  to 
construct  a  correct  PASCAL  program  in  a  large  number  of 
cases  and  which  often  gave  "almost  correct"  programs. 
Felicity  Findout  requires  a  system  which  will  allow  her  to 
obtain  answers  about  who  did  such-and-such  and  how  is 
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so-and-so  series  of  actions  to  be  performed.  She  thinks  that 
the  sort  of  systems  initiated  by  Green  (1969)  most  naturally 
allow  for  this  sort  of  "answer  extraction",  and  therefore 
has  implemented  a  resolution  system  with  answer  extraction. 
Robbie  Robot's  actions  are  set  up  in  such  a  way  that  certain 
subgoals  need  to  be  stated  and  proved  along  the  way  to 
performing  the  overarching  task.  It  seems  natural,  in  such  a 
system,  to  use  natural  deduction  techniques  which  explicitly 
state  the  subgoals  as  part  of  the  proof,  e.g.,  as  done  in 
the  Kalish  &  Montague  system.  Finally,  in  the  area  of 
cognitive  studies,  which  possibly  includes  natural  language 
processing,  one  is  obligated  to  try  to  mirror  the  actual 
processes  an  ordinary  human  goes  through  in  attempting  to 
solve  intellectual  questions.  In  such  studies,  one  needs 
first  to  find  out  how  real  people  do  this  in  simple  cases 
and  build  one's  theory  from  there.  Anecdotal  evidence  (of 
the  sort  alluded  to  in  Chapter  II)  indicates  that  the 
majority  of  people  perform  logic  tasks  in  some  type  of 
natural  deduction  system,  often  explicitly  setting 
themselves  subgoals. 

In  the  field  of  automatic  theorem  proving  the 
overwhelming  movement  has  been  the  investigation  of 
resolution  systems  (with  ever  more  fancy  control 
structures).  So  nearly  universal  has  this  movement  been  that 
even  in  the  fields  where  it  might  not  seem  so  natural  to  use 
resolution  methods  (e.g.,  automatic  program  synthesizing), 
the  strongest  force  has  been  the  attempt  to  invent 
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resolution  strategies  which  will  perform  the  job  (see  Bibel 
1979  and  Guiho  &  Greese  1980  for  example). 

The  major  emphasis  of  the  present  thesis  is  to 
construct  an  automatic  theorem  proving  system  using  natural 
deduction  techniques  so  as  to  investigate  the  intricacies  of 
such  systems.  I  shall  attempt  to  show  that  natural  deduction 
leads  to  straightforward  implementation  of 

easy-to-under stand  strategies  which  correspond  naturally  to 
human-like  proof  methods.  The  resulting  proofs  are  not  only 
solved  in  the  same  manner  as  people  do,  but  the 
intermediately  difficult  and  very  difficult  proofs  are 
performed  more  efficiently  than  resolution  systems  do  them. 
The  cone 
of  human 
research 
future . 
account 
other  me 
to  try  t 
resolut i 
very  lea 
are  easy 
need  not 
construe 
where  na 
treated . 
theorem 


lusion  I  draw  from  such  performance  is  that  the  use 
-like  strategies  is,  in  the  long  run,  the  best 
direction  for  automatic  theorem  proving  in  the 
Instead  of  trying  to  force  resolution  procedures  to 
for  those  uses  of  automatic  theorem  proving  where 
thods  seem  more  natural,  perhaps  the  time  has  come 
o  use  natural  deduction  techniques  in  areas  where 
on  procedures  have  achieved  some  success.  At  the 
st  I  hope  to  show  that  natural  deduction  techniques 
to  implement  on  a  computer  and  that  therefore  one 
expend  one's  energy  on  trying  to  invent  methods  of 
ting  new  resolution  methods  so  that  those  areas 
tural  deduction  is  the  obvious  choice  can  be 
Instead,  one  can  merely  adopt  the  sort  of  automatic 
prover  displayed  in  this  thesis. 
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In  general, 


my 


feelings  about 


natural  deduction 


systems 


vs.  resolution  systems  follow  those  expressed  by  Bledsoe 
(1977).  He  thinks  that  "natural  systems"  are  both  easier  for 
human  use  and  for  machine  use  of  knowledge  bases.25  As  far 
as  "human  use"  goes,  Bledsoe  points  out  that  with  natural 
deduction  systems  one  can  bring  mathematical  knowledge  (for 
example)  to  bear  in  the  same  form  as  it  is  used  in 
mathematics,  that  the  mathematician  can  easily  recognize 
places  where  such  knowledge  can  be  used,  that  the  systems 
are  easier  to  design  and  work  upon,  and  that  such  systems 
are  essential  for  mathematician-machine  interaction.  He  also 


finds  them  easier  for  machine  use  of  knowledge  along  these 
dimensions:  they  automatically  limit  the  search  by  not 
starting  all  proofs  as  a  "syntactic  search  strategy"  does; 
they  are  a  natural  vehicle  upon  which  to  hang  heuristics, 
knowledge,  and  "semantic  search  strategies";  they  make  it 
easier  to  combine  procedures  with  deduction;  and  they  solve 


the  contextual  data  base  problem. 


2  6 
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Bledsoe's  terminology  does  not  always  follow  standard 
usage.  He  includes  both  our  natural  deduction  systems  and 
the  Logic  Theorist  in  his  "natural  systems".  He  furthermore 
often  calls  such  systems  "semantic"  and  distinguishes  them 
from  those  with  "syntactic  search  strategies."  The  careful 
reader  of  the  last  two  Chapters  will  find  the  Logic  Theorist 
drastically  different  from  natural  deduction  systems, 
similar  only  in  that  one  can  easily  implement  such 
strategies  as  chaining  in  both  of  them.  Such  a  reader  will 
also  conclude  that  it  is  the  resolution  systems  which  are 
properly  called  "semantic".  Indeed,  the  Logic _ Theor i st  and 
the  natural  deduction  system  soon  to  be  described  are  as 
"syntactic"  as  could  possibly  be.  And  finally,  giventhe 
discussion  of  the  last  chapter  about  the  use  of  heuristics 
in  resolution  systems,  such  a  reader  would  be  chary  of 
claims  about  the  total  unsuitability  of  resolution  systems 
to  incorporate  "semantic  search  strategies." 

26It  is  not  clear  how  they  do  this  last.  Bledsoe  (1977:  15) 
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Whatever  one  thinks  about  Bledsoe's  specific  terminology, 
there  is  clearly  (or  so  it  seems  to  me)  a  point  to  be  made 
in  favour  of  natural  deduction  along  all  these  lines.  And  it 
is  this  feeling  I  intend  to  try  to  inculcate  in  the  reader 
by  my  development  (in  the  next  few  chapters)  of  a  natural 
deduction  system.  Generally  speaking,  I  intend  to 
investigate  how  people  perform  the  intellectual  task  of 
producing  a  proof,  and  "try  to  write  a  program  which 
performs  in  the  same  way.  In  this  I  follow  Bledsoe's  lead, 
who  says  (1977:  2 ) 

The  author  was  one  of  the  researchers  working  on 
resolution  type  systems  who  "made  the  switch".  It 
was  on  trying  to  prove  a  rather  simple  theorem  in 
set  theory  by  paramodulat ion  and  resolution,  where 
the  program  was  experiencing  a  great  deal  of 
difficulty  that  we  became  convinced  that  we  were  on 
the  wrong  track.  The  addition  of  a  few  semantically 
oriented  rewrite  rules  and  subgoaling  procedures 
made  the  proof  of  this  theorem,  as  well  as  similar 
theorems  in  elementary  set  theory,  very  easy  for  the 
computer.  Put  simply:  the  computer  was  not  doing 
what  the  human  would  do  in  proving  this  theorem. 

When  we  instructed  it  to  proceed  in  a  "human  like" 
way,  it  easily  succeeded. 

Indeed,  my  current  work  might  be  best  viewed  as  trying  to 

get  a  good  version  of  Bledsoe's  pure  proof  strategies 

2 6 (cont ' d)claims  only  that  resolution  provers  require  one 
data  base  for  each  clause. 
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working.  Only  after  this  is  done,  and  thoroughly  tested  on 
examples  from  pure  logic,  is  it  time  to  add  further 
heuristics  of  the  sort  called  domain-specific  reductions  by 
Bledsoe.  I  thus  want  to  ensure  that  the  domain-independent 
portion  works  as  desired  before  adding  anything  relevant  to 
domain-specific  strategies. 

B.  Some  Theoretical  Remarks  about  Theorem  Proving  by  Machine 

It  is  not  always  clear  what  particular  investigators 
want  out  of  their  theorem  provers.  One  might,  for  example, 
wish  to  use  computers  for  proving  theorems  of  mathematics, 
of  formalized  physics,  or  of  other  fields  for  which  there 
are  completely  formalized  theories.  The  ultimate  goal  of 
these  attempts  will  naturally  be  to  further  our 
understanding  of  the  fields  themselves  --  to  gather  new 
information  about  mathematics,  physics,  or  whatever.  Since 
the  underlying  logic  of  any  of  these  areas  is  taken  to  be 
classical,  one  might  first  wish  to  investigate  the  ability 
of  computers  to  prove  theorems  in  a  uninterpreted  logical 
calculus.  In  this  area,  however,  one  might  wish  to 
distinguish  between  attempts  to  further  our  understanding  of 
logic  per  se  from  the  attempt  to  further  our  understanding 
of  how  a  logician  thinks.  In  the  former  case  one  is 
unconcerned  with  the  actual  method  used  in  arriving  at 
theorems  (other  than  the  concern  that  it  be  a  legitimate 
method),  while  in  the  latter  case  one  would  wish  to  mirror 
the  logician’s  actual  process  of  constructing  a  proof.  It 
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seems  to  me  that  investigations  into  pure  logic  are  not  very 
interesting  unless  it  be  for  the  latter  type  of  reason, 
because  there  is  little  new  information  to  be  gained  about 
what  the  theorems  of  classical  logic  are.27  As  Feigenbaum  & 
Feldman  (1963:  107)  put  it: 

The  fascination  with  mechanical  theorem  proving  ... 
lies  less  with  the  end  (the  production  of  theorems, 
perhaps  new  and  important)  than  with  the  means  (a 
thorough  understanding  of  the  organization  of 
information  processing  activity  in  mathematical 
discovery).  It  is  felt  that  understanding  these 
problem-solving  processes  is  an  important  step 
toward  the  programming  of  more  complex,  more  general 
problem-solving  processes  for  a  variety  of 
intellectual  tasks. 

As  I  said,  it  is  not  always  clear  what  investigators 
think  they  are  modelling  with  their  logic  programs.  Newell 
et  al  (1957)  start  their  article  with  the  apparent  claim 
that  they  wish  to  investigate  a  person’s  reasoning  process, 
as  when  they  say  that  their  research 

is  aimed  at  understanding  the  complex  processes 
(heuristics)  that  are  effective  in  problem-solving 
...  We  wish  to  understand  how  a  mathematician,  for 

2  7 The  subject  of  what  theorems  can  be  proved  in  classical 
logic  was  pretty  thoroughly  canvassed  by  Whitehead  &  Russell 
in  1910-1912,  and  in  modern  elementary  logic  textbooks. 

While  various  new  and  interesting  theorems  have  been 
discovered  in  the  decades  since,  the  truly  exciting  work  in 
logic  has  been  done  at  the  model-theoretic  level,  and  not 
within  the  logic  itself. 
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example,  is  able  to  prove  a  theorem  even  though  he 
does  not  know  how,  or  if,  he  is  going  to  succeed. 

But  if  so,  it  is  most  unclear  what  the  point  is  of  looking 
at  the  algor ithmet ic  methods  at  all  (like  their  British 
Museum  Algorithm),28  since  it  is  manifest  that  students  do 
not  employ  such  methods.  Wang's  (1963)  approach,  on  the 
other  hand,  would  seem  to  be  an  investigation  into  the 
abstract  system  of  logic,  since  he  professes  unconcern  for 
any  heur i st ics  or  strategies  which  might  be  used  by 
students.  Instead  he  wishes  to  employ  methods  that  guarantee 
proofs  for  any  decidable  subset  of  logic  (p.96).  But  if  so, 
it  is  not  obvious  why  he  should  be  so  opposed  to  the  Newell 
et  al  approach  --  given  that  they  are  interested  in  a  model 
of  a  different  sort  of  thing  altogether,  nor  why  he  should 
be  interested  in  the  "practical  feasibility"  (for  people)  of 
using  one  set  of  connectives  rather  than  another  (p.97). 
Investigators  employing  resolution  procedures  are 
schi tzophren ic  on  this  issue:  on  the  one  hand  they  are 
unanimous  in  their  claim  that  resolution  methods  are  not 
what  people  use  for  constructing  proofs;  but  on  the  other 


280ne  should  point  out  here  that  the  British  Museum 
Algorithm  of  enumerating  all  proofs  is  not  so  bad  as  Newell 
et  al  (1957)  would  have  us  believe.  In  fact  Sikolossy  et  al 
(1973)  have  presented  such  a  program  and  showed  that  it 
could  find  all  the  proofs  the  Logic  Theorist  could  find  (and 
more  quickly)  plus  some  others  of  Whitehead  &  Russell 
(1910).  The  reason  for  this  is  that  the  proofs  of  the 
theorems  of  Chapter  II  of  Whitehead  &  Russell  are  so  simple 
as  to  take  only  one  or  two  steps.  Of  course  the  method  bogs 
down  after  two-step  proofs.  No  three-step  proof  was  found 
before  memory  was  exhausted.  Perhaps  one  should  examine  the 
proof  in  Appendix  I  of  the  associativity  of ' ^  to  decide 
whether  it  is  plausible  to  suppose  that  any  breadth- f i rst 
approach  can  work  on  this. 


85 


hand  such  procedures  are  advocated  for  use  in  trying  to 
simulate  human  problem-solving  systems  such  as  robotic 
planning  and  natural  language  systems. 

As  an  attempt  to  model  a  normal  (logic)  student,  the 
Logic  Theorist  has  the  problem  of  simulating  one  who  is 
learning  some  axiomatic  system.  Almost  every  person  (whether 
professional  logician  or  student)  finds  axiomatic  systems 
very  difficult.  If  one  wants  to  mirror  ordinary  logical 
reasoning,  one  would  do  better  to  look  at  how  people  learn 
to  do  proofs  in  one  of  the  other  versions  of  logic. 
Furthermore  if  one  is  interested  in  how  they  learn  logic, 
one  should  reject  the  "semantic"  systems  also.  There  are  two 
reasons  for  this.  The  first  is  that  the  semantic  systems  are 
not  "really"  logic.29  I  quote  here  from  Georgacarakos  & 

Smith  (1978:  x i v ) 

In  keeping  with  our  aim  of  theoretical  soundness,  we 
have  sharply  distinguished  between  the  semantical 
and  the  syntactical  correlates  of  the  logical 
concepts  we  study  throughout  the  text.  We  introduce 
the  technique  of  tree  construction  as  a  semantical 
device,  the  aim  of  which  is  to  discover  counter¬ 
interpretations  for  invalid  argument  forms.  Many 
authors  regard  trees  as  syntactical  devices,  and  of 


2 ’This  claim  is  over-strong  for  two  reasons:  (a)  these 
semantical  systems  are,  in  a  sense,  logic  --  as  the  quoted 
text  makes  clear,  and  (b)  it  is  undeniably . true  that 
students  do  use  semantical  considerations  in  formulating 
"real"  logic  proofs.  (In  this  last  regard  see  Reiter  1973). 
However,  as  I  see  "real"  logic,  it  is  syntactical  --  and 
that  presupposes  that  it  is  a  pattern  matching  task. 
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course  in  a  sense  they  are  (they  involve 
manipulation  of  symbols).  However,  the  correct 
purpose  of  tree  construction  is  semantical  in  that 
it  is  to  be  used  as  a  device  to  find  possible 
counter interpretat ions . 

Recalling  Chapter  II,  we  note  that  resolution  systems, 
although  they  might  not  explicitly  manipulate  trees,  are 
nonetheless  a  form  of  semantical  system  designed  to  find 
counter interpretat ions ,  and  so  this  argument  works  equally 
well  against  considering  them  to  be  "real"  systems  of  logic. 
The  second  reason  to  reject  the  resolution  systems  as  an 
account  of  a  person's  logical  abilities  is  that  they  lend 
themselves  too  easily  to  methods  which  are  beyond  the  ken  of 
ordinary  people.  If  one  really  wishes  to  study  how  ordinary 
people  perform  logical  analysis,  one  should  use  some  natural 
deduction  technique  such  as  the  Kalish  &  Montague  method 
outlined  in  Chapter  II.  Even  the  most  optimistic  of  the 
resolution  theorists,  Chang  &  Lee  (1973),  are  willing  to 
concede  that  resolution  techniques  are  too  complex  and  time 
consuming  for  ordinary  people  to  use.30 

A  reason  of  a  different  sort  for  rejecting  the 
resolution  systems  is  technical:  there  simply  is  no  good  way 
to  construct  proofs  in  these  systems.  It  is  to  arguments  of 
this  sort  that  we  now  turn. 


3 “Anderson ' s  (1973)  review  of  Chang  &  Lee  lists  Lheir 
optimism  about  the  suitability  of  resolution  for  mechanical 
theorem  proving  as  the  major  shortcoming  of  the  book. 
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C.  Some  Problems  with  the  Resolution  Strategies 

The  problems  with  resolution  can  be  broken  into  two 
types:  (a)  certain  strategies  are  incomplete,  (b)  other 
strategies  are  exponentially  difficult.31  While  I  wish  here 
to  concentrate  on  the  latter,  I  should  first  talk  a  bit 
about  the  former. 

The  simplest  sorts  of  resolution  --  simplest  from  both 
the  conceptual  and  computational  points  of  view  --  are  input 
and  unit  resolution.  We  have  already  seen  that  they  are  not 
complete,  but  there  is  even  in  these  cases  something  more  to 
be  said  about  implementing  them.  Ask  yourself,  is  input 
resolution  a  finite  procedure:  does  it  always  terminate 
either  in  @  or  in  there  being  no  more  resolutions  to 
perform.  Well,  no  --  strictly  speaking.  Not  until  we  have 
some  way  of  saying  "do  not  perform  a  given  resolution  more 
than  once".  This  means  that  track  must  be  kept  on  which 
resolutions  have  already  been  performed.  (Alternatively,  if 
a  resolvent  is  identical  to  an  already-present  clause,  do 
not  add  it.  Quit  when  each  member  of  the  input  has  been 
checked  and  found  to  generate  no  new  resolvents.)  Quite 
obviously,  this  already  makes  the  implementation  of  even  the 
simple  input  resolution  a  task  that  requires  some 


3 ’Of  course,  unless  P=NP,  all  strategies  are  exponential  (in 
terms  of  the  number  of  steps  required  to  produce  a  proof, 
relative  to  the  size  of  the  premises  and  conclusion  set, 
since  the  unsatisfiability  problem  is  co-NP  complete  even 
for  the  sentential  calculus,  at  least  in  the  worst  case.  I 
wish  to  suggest  that  these  systems  are  exponential  in  even 
the  average  case,  and  that  there  is  every  reason  to  believe 
that  natural  deduction  is  better.  Further  discussion  appears 
below . 
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considerable  overhead  in  terms  of  checking  these  things.  It 
is  simply  not  true  that  "pure  resolution"  can  be  implemented 
in  the  simple  and  straightforward  manner  certain  authors 
seem  to  claim.  Complaints  that  natural  deduction  methods  in 
contrast  to  resolution  methods  need  to  keep  track  of  a  wide 
variety  of  rules  of  inference,  that  they  do  not  have  simple, 
uniform  logical  statements,  and  that  they  do  not  have  a 
uniform  proof -complet ion  recognition  criterion,  are  simply 
not  well-taken.  (Cf.  Sandford  1980:  pp.  2-3). 

Besides  the  obviously  incomplete  strategies  such  as 
input  and  unit  resolution,  there  are  less  obviously 
incomplete  examples.  The  "semantic"  resolution  methods 
mentioned  in  Chapter  III  which  use  ordered  clauses  are  also 
not  complete  (see  Chang  &  Lee  1973:  p.  116).  And  Boyer’s 
(1971)  "lock  resolution",  while  itself  complete,  seems 
incapable  of  incorporating  any  of  the  standard  strategies 
for  simplifying  the  search  space  without  losing 
completeness.  Thus,  for  example,  one  cannot  even  eliminate 
tautologies  from  an  unsat isf iable  set  of  clauses  (nor  use 
set  of  support,  etc.)  and  be  guaranteed  still  to  generate  @ 
by  this  method.  Perhaps  fancier  strategies  for  keeping  track 
of  "lock  numbering"  in  this  method  will  allow  this 
elimination.  But  at  present  this  is  an  open  question.  (See 
Sandford  1980:  p.  225). 

As  I  see  it,  the  real  problem  with  pure  resolution  is 
that  it  cannot  distinguish  the  conclusion  to  be  proved  from 
the  premises  --  they  are  both  represented  by  sets  of 
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clauses.  And  if  there  is  a  contradiction  to  be  found,  we 
have  no  reason  to  suspect  that  it  is  only  because  of  the 
addition  of  the  negation  of  the  conclusion  rather  than 
amongst  the  premises.  Contrast  this  with  natural  deduction 
where  an  explicit  distinction  is  made  between  what  is  to  be 
proved  and  what  are  antecedent  lines.  To  be  sure,  once  we 
have  assumed  a  negation  in  natural  deduction,  such  an 
assumption  is  treated  as  any  other  antecedent  line  is.  But 
we  still  know  what  we  are  attempting  to  prove,  and 
resolution  provers  cannot  do  this  since  they  do  not  keep 
explicit  track  of  the  conclusion.  A  limited  attempt  to 
distinguish  between  premises  and  conclusion  is  made  in 
resolution  by  the  addition  of  the  set  of  support  strategy 
(Wos  et  al  1965).  But  even  here  there  is  no  real  track  kept 
of  what  it  is  we  are  trying  to  prove,  just  that  it  is  in  the 
set  (S-T),  where  T  is  the  set  of  support  and  S  are  all  the 
clauses  of  the  argument.  This  problem,  of  not  knowing 
exactly  what  we  are  attempting  to  prove,  is  most  clearly 
demonstrated  by  the  kind  of  examples  Bledsoe  (1971)  used  to 
motivate  his  splitting  heuristics.  If  we  are  trying  to  prove 
(say)  a  conjunction,  it  is  clearly  preferable  to  notice  that 
we  are  done  when  each  conjunct  separately  is  shown.  Even  in 
a  resolution  framework  it  would  be  easier  to  prove  first 
that  ( S  &  ^  P  i  )  is  unsat  i  sf  iable  and  then  that  (S&"1P2)  is 
unsat  i sf  iable  rather  than  that  the  complex  ( S& (  -,P 1  +-1? 2  )  )  is 
unsat i sf iable .  (This  is  precisely  the  move  that  Bledsoe  1971 
makes:  after  splitting,  the  subproblems  are  sent  to  a 
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resolution  subroutine).  And  such  considerations  are  even 
more  important  when  we  consider  proofs  involving  complex 
formulae  such  as  biconditionals  which  themselves  contain 
many  subformulae. 

Other  ways  of  organizing  the  resolution  search  do  not 
fare  well  either.  Consider  linear  resolution  (whether  or  not 
augmented  by  ordered  clause  heuristics).  While  pure 
resolution  ("level  saturation  resolution")  invokes  new 
clauses  at  a  rate  exponential  in  the  number  of  clauses  of 
the  previous  level,  linear  resolution  appears  to  be 
restrictive.  Since  one  of  the  parents  will  always  be  the 
most  recent  resolvent,  it  is  clear  that  once  a  resolution  is 
started,  the  number  of  new  clauses  at  a  given  level  will 
never  be  more  than  the  number  already  present  (since  that  is 
the  upper  bound  on  the  number  of  possible  resolutions 
available)  and  so  the  total  number  of  clauses  at  a  given 
level  is  just  the  level  plus  the  number  of  original  clauses. 
But  this  gain  is  only  an  illusion.  The  completeness  of 
linear  resolution  only  entails  that  there  is  some  linear 
resolution  which  will  generate  @.  In  fact  then,  to  find  it 
one  must  try  all  the  possible  ways  to  start  a  resolution, 
and  at  each  level  all  the  possible  ways  to  continue  it.  So 
the  problem  is  once  again  seen  to  be  exponential  in  the 
number  of  original  clauses.  For,  not  only  do  we  have  to 
consider  all  the  possible  ways  to  start  (a  polynomial 
problem)  but  also  (for  each  started  resolution  chain)  we 
have  to  consider  every  variant  of  each  step.  That  is,  for 
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each  generated  resolvent  in  each  chain,  if  more  than  one 
clause  will  resolve  with  it,  we  must  keep  track  of  all  the 
possible  chains. 

Similar  remarks  can  be  made  about  the  semantic 
strategies.  Of  course  every  setting  will  divide  the  clauses 
into  two  non-empty  classes:  those  true  in  the  setting  and 
those  false  in  the  setting.  Obviously,  unless  the  setting  is 
chosen  with  care,  there  is  no  gain.  And  the  problem  of 
choosing  a  setting  is  itself  an  exponentially  explosive  one, 
for  if  it  were  easier  then  the  original  problem  with 
resolution  would  not  arise.  (The  number  of  settings  is 
exponential  in  the  number  of  distinct  literals,  so  finding 
the  optimal  setting  is  exactly  equivalent  to  showing  the 
original  set  of  clauses  to  be  inconsistent.) 

It  seems  to  me  that  there  is  no  easy  way  around  these 
difficulties  for  resolution.  Now,  it  may  be  that  natural 
deduction  techniques  introduce  an  exponentially  explosive 
problem  also  --  indeed,  one  would  expect  so  if  the  simple 
problem  of  joint  propositional  satisfiability  is  an 
NP-complete  problem.  But  natural  deduction  parcels  the 
difficulties  into  different  areas:  some  problems  become 
recognizing  when  a  subproof  is  done,  others  become  the 
generation  of  subgoals,  and  still  others  become  akin  to 
resolution-based  system's  "blindly  try  all  the  rules  of 
inference"  .  3  2 

32Of  course,  not  all  resolution-based  systems  can  be 
obviously  treated  as  "blind"  in  this  sense.  For  example,  the 
"semantic  setting"  strategies  (including  set  of  support) 
are,  in  a  sense,  goal-driven.  But  this  appearance  is  shown 
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Now,  in  the  worst  cases  one  expects  natural  deduction  to 
perform  no  better  than  resolution.  But  natural  deduction  has 
a  variety  of  distinct  parts  and  strategies  for  proving 
things,  rather  than  just  the  "keep  resolving  until  E  is 
generated."  Each  of  these  components  has  an  area  in  which  it 
excels,  and  one  therefore  expects  that,  in  the  "average 
case",  natural  deduction  will  perform  better  than 
resolution . 

In  any  case  it  seems  worthwhile  to  try,  and  the  system 
to  be  described  later  is  a  first  attempt  to  show  the  power 
of  natural  deduction  and  the  simplicity  with  which  a  variety 
of  strategies  can  be  incorporated  into  such  a  non-resolution 
system. 

D.  Resolution  in  General 

The  resolution  method,  when  not  augmented  by  any 
heuristic  search  control,  is  very  inefficient,  time 
consuming,  and  storage  consuming.  I  quote  here  from  Kowalski 
(1978:  163) 

The  search  space  determined  by  unrestricted 
application  of  the  resolution  rule  is  highly 
redundant.  Redundancy  can  be  avoided,  at  the  cost  of 
flexibility,  by  restricting  resolution  to  top-down 

3  2 (cont ' d ) to  be  illusory  for  the  reason  mentioned  above:  the 
discovery  of  the  correct  "setting"  is  itself  an 
exponentially  difficult  problem.  Other  apparently 
goal-driven  systems  face  the  same  difficulty  because 
resolution-based  strategies  are  forced  to  lump  all  the 
different  aspects  of  theorem  proving  into  just  the  one  rule 
of  inference:  resolution. 
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or  bottom-up  inference. 

The  general,  widespread  dissatisfaction  with  resolution  as  a 
general  purpose  inference  method  can  be  clearly  seen  by 
looking  at  any  volume  containing  papers  where  resolution  is 
being  discussed  from  anything  but  a  "technical,  theorem 
proving  via  resolution"  framework.  For  example,  a  random 
sampling  from  the  articles  of  IJCAI  5  (1977)  yielded  the 
following : 

[Our  project]  consists  of  writing  a  computer  program 
which  can  solve  a  wide  variety  of  simple  mechanics 
problems  stated  in  English...  Our  methodology  is  to 
find  general,  justifiable,  inference  rules  which  can 
be  combined  with  mechanics  problems.  As  is  well 
known,  when  rules  like  these  are  run  on  a  general 
inference  machine  [a  resolution  prover]  the  result 
is  often  a  combinatorial  explosion.  Rules  are 
combined  in  unexpected  ways  and  the  search  for  a 
solution  is  developed  along  unreasonable  paths. 

(Bundy  1977:  496) 

A  considerable  amount  of  recent  work  in  theorem 
proving  has  been  concerned  with  methods  for 
increasing  the  power  of  inferences  which  can  be  made 
in  special  cases.  This  appears  to  be  in  response  to 
the  widespread  recognition  that  general  purpose 
theorem  provers,  particularly  those  using  resolution 
as  their  inference  rule,  have  extreme  difficulty 
with  particular  aspects 


of  proofs  for  which  there 
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are  fairly  effective  algorithms  and  heuristics. 
(Harrison  1977:  529) 

[There  are]  two  serious  disadvantages,  in  the 
authors'  opinion,  of  resolution.  First,  resolution 
is  a  deduction-oriented  rule,  and  there  are 
generally  vastly  more  inferences  that  can  be  deduced 
from  a  set  of  clauses  than  are  used  in  the 
refutations  produced,  even  when  very  restrictive 
strategies  are  used;  we  point  to  the  low  penetrance 
factors  cited  in  the  literature.  Second,  while  the 
separation  of  variables  insures  that  the  general 
resolvent  will  subsume  families  of  resolvents  of 
instances,  such  generality  puts  the  inference  in  a 
strictly  local  context  --  aside  from  possible  future 
resolutions,  there  is  no  connection  with  any  other 
deduction  performed  in  the  search  so  far. 

(Henschen  &  Evangelist  1977:  541) 

The  interest  in  automatic  theorem  proving,  which  was 
very  high  in  the  AI  community  in  the  late  sixties, 
has  decreased.  One  of  the  reasons  was  the 
impossibility  by  now  of  obtaining  a  theorem  prover 
of  wide  applicability.  In  particular  "complete" 
search  strategies  based  on  resolution  have  been 
heavily  criticized.  In  fact  it  is  felt  that  the 
crucial  problem  is  not  to  have  an  efficient  prover 
but  to  be  able  to  communicate  with  it  and  to  drive 
it.  It  is  also  felt  that  efficiency  improvements 
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cannot  increase  significantly  the  performance. 

(Martelli  &  Montanari  1977:  543) 

E.  Problem  Reduction  Format 

What  such  researchers  as  those  mentioned  in  the  last 
section  are  interested  in  are  the  kinds  of  uses  of  theorem 
proving  cited  in  Chapter  I  that  have  to  do  with  planning, 
problem  solving,  natural  language  inference,  and  knowledge 
representation.  And  the  claim  is  that  resolution  systems  are 
not  well  suited  to  these  uses.  Perhaps  we  should  try  to  see 
what  kind  of  system  would  be  well  suited. 

In  these  areas,  the  overall  scheme  of  knowledge 
representation  and  natural  language  inference  can  be  seen 
like  this:  first,  there  is  a  representation  of  "the  current 
state  of  the  world"  (a  "current-world-data-base",  CWDB); 
second,  some  new  datum  is  added  to  the  CWDB;  and  third,  the 
theorem  prover  is  called  to  find  out  what  other  data  must  be 
added  to  the  CWDB  (or  deleted)  to  accommodate  the  new  datum. 
In  problem  solving,  there  might  be  a  request  to  try  to  find 
a  sequence  of  steps  (of  alterations  to  the  CWDB)  which  will 
have  the  consequence  of  adding  the  desired  goal  state  of  the 
problem  to  the  CWDB.  It  is  clear  here  the  ways  in  which  one 
might  want  to  invoke  a  theorem  prover;  one  might  want  first 
to  check  whether  the  desired  goal  state  isn’t  already 
implied  by  the  CWDB,  or  one  might  have  incorporated  into  the 
CWDB  such  propositions  as  "if  x  is  at  place  p,  at  time  t, 
and  y  is  at  p,  at  t ^  , 


then  x  is  next  to  y"  and  "if  x  is  next 
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to  y  at  t,  and  y  is  moved  at  t2  and  t,<t2,  then  x  is  not 
next  to  y  at  t2"  and  might  have  proposed  an  activity  of 
moving  y.  The  theorem  prover  should  discover  which 
particular  statements  about  the  relative  locations  of  x  and 
y  to  add  and  delete  from  the  CWDB. 

What  I  take  to  be  the  crucial  feature  of  these 
approaches  is  their  "problem  reduction  format."  The  typical 
development  of  problem  reduction  is  to  regard  this  format  as 
being  equivalent  to  an  AND/OR  tree,  where  the  root  node  is 
the  goal  to  be  finally  achieved.  Nodes  on  a  tree  are  either 
AND  nodes  --  which  means  that  every  one  of  the  subgoals 
under  this  node  X  (i.e.,  the  nodes  X  dominates)  must  be 
achieved  in  order  for  X  to  be  achieved,  or  OR  nodes  --  which 
means  that  achieving  one  of  the  nodes  under  X  is  sufficient 
for  achieving  X.  The  leaf  nodes  of  an  AND/OR  tree  represent 
"primitive  actions"  (primitive  at  least  from  the  point  of 
view  of  the  tree  as  thus  far  expanded).  So,  an  AND/OR  tree 
for  a  certain  goal  G  might  be  (representing  AND  nodes  by 
italics  and  OR  nodes  in  roman  type): 


This  representation  corresponds  to  having  premisses  in  an 
argument  of: 


A ,  -*G 
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A  2  -*G 

A  3  ->G 

(A|  ,  &A  ,  2  &A  !  3  )  -*A  , 

(A3  ^  &  A  3  2  J^A  3 

A  1  3  1  ->A  1  3 

A  1  3  2  “*A  1  3 

A  t  3  3  "*A  !  3 

(  A  1  31  1  &A  1312)  '^A  t  3  ! 

One  tries  to  find,  in  the  CWDB,  some  further  ("atomic”) 
assertions  (such  as  A133,  A,,,  A12)  which  --  when  added  to 
the  above  premises  --  will  allow  the  deduction  of  G.  Or 
alternatively,  the  theorem  prover  might  attempt  to  construct 
a  proof  of  G  from  these  premises  in  order  to  discover  what 
further  statements  would  have  to  be  added  in  order  for  the 
proof  to  go  through  and  then  recommend  that  these  further 
statements  be  added.  (In  this  last  I  have  in  mind  systems 
like  Robbie  Robot’s  of  Chapter  I,  where  upon  noticing  that 
the  goal  cannot  be  achieved  unless  the  CWDB  contains  some 
further  assertions  the  system  performs  the  relevant  actions 
so  as  to  alter  the  CWDB  in  the  requisite  manner). 

I  think  the  problem  reduction  format  is  extremely 
natural  as  a  representation  of  how  people  actually  try  to 
solve  problems  (although  as  we  shall  see  in  a  future 
section,  this  precise  version  of  it  suffers  from  some 
technical  difficulties).  I  would  argue  here  that  natural 
deduction  systems  of  logic  are  more  clearly  suited  to  this 
method  of  problem  solving,  natural  language  inference,  and 
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knowledge  representation  (as  sketchily  described  here)  than 
are  either  axiomatic  or  "semantic"  (including  resolution) 
logics.  The  overall  view  I  take  is  that  (1)  such  problem 
solving  systems  cry  out  for  a  general ,  domain  independent 
supervisory  system  of  the  natural  deduction  type  which  can 
be  supplemented  with  a  domain  specific  "expert"  set  of 
premises  (or  rules  of  inference)  relevant  to  that  domain, 

(2)  "pure  expert"  systems  are  inappropriate  on  the  grounds 
that  they  do  not  reasonably  mirror  what  is  common  to  all  the 
problem  domains,  and  (3)  resolution  systems  (in  particular) 
are  the  wrong  general  supervisory  system  because  of  their 
inherent  inefficiency. 

It  seems  clear  that  if  one  accepts  anything  like  the 
problem  reduction  format  as  being  a  general  representation 
of  human  problem  solving  and  as  being  a  reasonable  way  to 
model  this  by  a  computer,  one  is  going  to  demand  some 
general  method  by  which  the  system  can  figure  out  (a)  how  to 
proceed  in  attaining  the  desired  goal,  and  (b)  when  the  goal 
has  been  achieved.  Given  the  problem  reduction  format,  this 
amounts  to  precisely  the  natural  deduction  of  the  Kalish  & 
Montague  sort,  where  the  goal  i s _ expl ic i t ly  stated  and  the 
CWDB  contains  premises  for  the  desired  proof.  An 
implementation  of  the  Kalish  &  Montague  system  will  itself 
set  up  appropriate  subgoals  (given  an  overall  goal)  and  will 
construct  the  proper  sequence  of  steps  to  achieve  each 
subgoal.  Of  course  if  the.  CWDB  is  very  large,  one  will  want 
some  way  of  paring  down  the  number  of  premises  actually  in 


c 


99 


use.  The  method  discussed  in  Chapter  V  is  my  answer  to  this 
problem. 

Clearly,  "pure  expert"  systems  are  inappropriate 
because  they  do  not  give  an  overall  picture  of  the  human 
problem  solving  ability.  Such  systems  move  the  pure  logical 
ability  (inference)  into  the  programming  language  itself, 
and  are  bound  to  engender  confusions  about  what  logic  --  a 
method  of  reasoning  --  really  is.  Hayes  (1977)  argues 
convincingly  that  (p.  563) 

The  interactions  sanctioned  by  logic  between 
assertions  are  far  richer  and  more  complicated  than 
the  interactions  between  procedures  in  a  procedural 
language  ( any  procedural  language).  Thus,  explicit 
recursive  procedure  calls  (LISP)  are  more  restricted 
than  explicit  coroutine  calls  (SIMULA),  these  more 
restricted  than  pattern-directed  coroutineing 
(CONNIVER) ,  these  more  restricted  than  resolution 
(which  allows  both  caller  and  callee  to  have 
variables  bound  during  the  matching  process)  and 
finally  resolution  itself  is  a  special  case  of 
general  logic  inference  rules  of  instantiation  and 
cut . 

Although  his  following  remark  does  not  seem  to  be 
necessarily  true  unless  one  has  in  mind  no  strategies  for 
restricting  the  search  space. 

In  each  case,  the  more  general  interaction  pattern 
allows  more  interactions  and  hence  yields  a  more 


c 


100 


complex  search  space,  and  a  more  difficult  search 
problem. 

F.  Getting  into  Clause  Form 

Very  few  arguments,  whether  from  a  technical  area  such 
as  mathematics  and  programming  or  from  "everyday  life"  such 
as  planning  and  natural  language,  are  stated  in  clause  form. 
To  use  resolution  methods  then,  one  either  must  develop  new 
"non-clausal "  methods  or  else  convert  one's  representations 
into  clause  form.  In  Chapter  III  I  mentioned  Murray's  (1982) 
NC-Resolut ion  which  operates  on  non-clausal  formulae.  This 
system  is  extraordinarily  complicated  but  complete,  and  can 
be  augmented  by  semantic  resolution  methods  like  those 
discussed  in  Chapter  III.  However,  it  is  not  obvious  that 
any  of  the  other  strategies,  e.g.,  eliminating  tautologies, 
works  well  on  this  system  due  to  the  difficulty  in 
determining  tautologousness  in  a  non-clausal  format.  It 
seems  to  me  that  the  complexity  of  the  resulting  method  is 
such  that  there  is  very  little  reason  to  prefer  it  to 
ordinary  resolution.  After  all,  the  alleged  reason  to  prefer 
non-clausal  format  is  that  most  problems  are  not  given  in 
clausal  form  (by  people);  so  can  there  be  any  rationale  for 
a  method  completely  alien  to  people's  proof  methods  but 
which  clings  to  the  non-clausal  form  preferred  by  people?  In 
any  case,  Murray's  system  does  not  really  use  formulae  in 
the  form  given  by  mathematics  or  natural  language.  The 
formulae  are  converted  to  a  "skolemized  form"  but  without 
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first  converting  to  prenex  normal  form.  This  is  the  method 
described  by  Wang  (1963)  and  also  used  in  the  Bledsoe 
systems  discussed  in  Chapter  III.  It  involves  determining 
"negative”  and  "positive"  occurrences  of  the  quantifiers  and 
replacing  the  variables  bound  by  some  of  the  quantifiers  by 
appropriate  skolem  functions  of  the  other  quantifiers.  It 
would  seem  that  Murray’s  system  combines  the  worst  of  all 
worlds:  formulae  must  be  first  converted  into  a  different 
form  from  that  encountered  normally  as  opposed  to  the  system 
described  later  in  this  thesis,  the  proof  method  is 
extremely  complicated  as  opposed  to  the  resolution  method, 
and  few  of  the  well-understood  heuristics  are  applicable  to 
it  as  opposed  to  both  resolution  and  natural  deduction. 

One  should  point  out  that  some  amount  of  time  is 
required  to  convert  formulae  into  another  form,  whether  this 
other  form  be  clausal  or  the  non-clausal  "skolemizat ion"  of 
Wang,  Murray  and  Bledsoe.  For  sufficiently  complex  formulae 
this  can  be  a  non-trivial  amount  of  time,  although  it  may  be 
trivial  compared  to  the  time  involved  in  actually  proving 
the  theorem.  People  who  discuss  the  efficiency  of  their 
theorem  prover,  expecially  those  who  give  actual  times  (as 
for  instance  when  they  compare  their  prover' s  times  with  the 
well-known  times  of  the  Logic  Theorist),  ought  to  be 
required  to  include  the  cost  of  converting  to  their 
normalized  form.  This  is  especially  relevant  when  comparing 
resolution  provers  to  those  like  the  Logic  Theorist  or  the 
system  to  be  described  later  which  do  not  use  any  normalized 
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form. 

G.  Bledsoe's  "natural"  systems:  A  critique 

As  indicated  in  the  last  chapter,  Bledsoe's  systems 
have  undergone  a  variety  of  changes.  The  earlier  systems 
were  not  much  removed  from  resolution  systems  (with  the 
exception  of  the  splitting  strategies).  The  later  systems 
are  more  in  the  spirit  of  true  natural  deduction. 
Nonetheless,  at  least  from  the  published  accounts,  these 
later  systems  are  not  very  good  knights  to  be  carrying  the 
natural  deduction  banner. 

I  have  already  noted  that  Bledsoe  et  a7  (1972)  had 
mentioned  that  his  system  is  incomplete.  A  further 
discussion  of  this  issue  is  carried  on  below  in  Chapter  VII, 
where  it  is  shown  that  the  system  to  be  displayed  later  in 
this  thesis  is  not  incomplete,  at  least  not  in  this  way. 

Two  other  shortcomings  of  the  Bledsoe  systems  should  be 
brought  up  at  this  point.  First,  the  earlier  systems  (1971, 
1972)  used  only  backward  chaining  to  generate  subgoals 
(other  than  those  subgoals  generated  by  the  splitting 
heuristics).  As  is  noted  in  his  (1974),  this  is  very 
inefficient  at  finding  appropriate  subgoals;  and  indeed, 
seems  only  to  be  of  real  use  when  a  negation  has  been 
assumed  (that  is,  when  (imply  (H-^C)  R)  was  changed  to 
(imply  (H&C+nil)  R).  But  in  this  case  surely  it  would  be 
just  as  easy  to  switch  to  a  resolution  proof.  The  1974 
system,  incorporating  "ground  forward  chaining",  presumably 
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does  a  better  job  at  this  (but  because  of  lack  of 
comparative  data,  one  cannot  be  sure). 

The  earlier  versions  used  a  depth-first  style  of 
search,  at  least  in  the  "reduction"  portion  of  the  systems. 
As  the  quotation  in  Chapter  III  has  it:  "if  one  can  make  a 
hypothesis  contribute  all  it  can  towards  establishing  a 
conclusion,  then  it  can  be  ignored  in  trying  to  prove  the 
remainder."  This  is  not  a  good  principle,  at  least  not 
without  some  modification.  Consider  for  example  linear 
resolution.  We  have  already  seen  that  in  general  one  needs 
to  re-use  centre  clauses,  even  though  they  "have  contributed 
all  they  can"  towards  generating  In  natural  deduction 
systems  the  same  phenomenon  occurs.  One  may  need  to  use  a 
formula  to  generate  some  intermediate  conclusion,  and  then 
later  use  this  intermediate  conclusion  to  further  the  proof 
using  other  hypotheses,  and  finally  come  back  to  the 
original  hypothesis  again.  Such  cases  are  quite  common  in 
fact.  Suppose  for  example  one  wished  to  prove  C:  Every 
person's  mother's  mother  is  female,  from  the  premises  P,  : 
Every  person  has  a  mother,  P2:  If  someone  is  a  mother  of 
anyone,  then  that  someone  is  a  female  person.  It  seems  that 
the  natural  way  to  prove  this  would  be:  let  x  be  a  person, 
then  by  P,  there  is  a  y  which  is  a  mother  of  x.  Hence  by  P2, 
y  is  a  female  person,  and  hence  a  person.  But  now  we  go  back 
to  P,  to  note  that  y  has  a  mother  z,  and  by  P2  Z  is  a  female 
person  and  hence  female.  It  is  just  not  true  here  that  what 
we  have  done  is  "used  P,  as  much  as  we  can  and  then 
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discarded  it."  (Similar  remarks  hold  here  for  P2  as  well). 

It  is  perhaps  for  reasons  such  as  this  that  in  1974 
Bledsoe  &  Bruell  abandoned  depth  first  search  in  favour  of  a 
breadth  first  search.  But  such  a  strategy  also  is  not  good. 
After  all,  one  of  the  points  of  natural  deduction  is  that 
"once  you're  on  the  right  track,  keep  it  up  until  you're 
done".  And  the  reason  this  can  be  implemented  in  a  natural 
deduction  system  of  the  Kalish  &  Montague  sort  is  that  one 
can  tell  when  "you're  on  the  right  track",  because  of  the 
structure  of  the  current  subgoal.  The  reason  Bledsoe's 
systems  cannot  adopt  this  is  because  their  whole  aim  was  to 
return  substitution  instances  for  variables  from  subproofs. 
But  once  one  rids  oneself  of  this  preconception  of  what  is 
the  point  of  a  natural  deduction  system,  one  can  have  the 
limited  depth  first  search  so  obviously  called  for. 

H.  Natural  Deduction  and  the  Problem  Reduction  Format 

One  problem  with  natural  deduction,  it  might  be 
alleged,  is  that  natural  deduction  systems  incorporate  too 
many  rules  of  inference,  that  they  are  not  elegant,  and  that 
one  should  favour  resolution  systems  with  their  one  rule  of 
inference  (augmented  by  a  substitution  mechanism)  --  even  if 
this  tends  to  lead  to  less  "natural"  proofs  and  a  great 
expansion  of  the  search  space.  In  response  to  this,  one 
should  point  out  that  taste  (in  elegance  or  simplicity)  is  a 
matter  of  taste.  If  one  wishes  to  mirror  human  problem 
solvers,  the  number  of  rules  is  not  a  reasonable  measure  of 
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success.  Furthermore,  if  the  measure  is  of  programming  ease, 
one  should  first  construct  a  natural  deduction  system  prover 
to  see  whether  or  not  it  is  easier  to  construct  a  resolution 
or  a  natural  deduction  prover.  In  fact,  the  prover  to  be 
described  later  in  this  thesis  is  quite  a  bit  more 
complicated  to  program  than  the  example  resolution  provers  I 
have  seen.  But  then  it  can  do  a  lot  more  than  they  can.  What 
has  not  been  shown  is  that  given  provers  of  equal  capacity, 
resolution  is  easier  to  program  than  natural  deduction.  And 
a  further  factor  to  consider  here  is  the  ease  with  which  a 
specific  style  of  theorem  prover  can  be  extended  when  one 
discovers  that  it  is  incapable  of  proving  some  class  of 
problems.  In  this  regard,  the  following  quotation  from 
Nevins  (1974)  is  relevant. 

A  point  worthy  of  stress  is  that  a  deductive  system 
is  not  "simpler”  merely  because  it  employs  fewer 
rules  of  inference.  A  more  meaningful  measure  of 
simplicity  is  the  ease  with  which  heuristic 
considerations  can  be  absorbed  into  the  system. 

Another  alleged  shortcoming  with  natural  deduction 
theorem  proving  is  the  consequence  of  these  three  beliefs 
which  seem  to  be  widely  held:  first  that  natural  deduction 
is  equivalent  to  the  problem  reduction  format,  second  that 
the  problem  reduction  format  is  equivalent  to  AND/OR  tree 
representation  of  problems,  and  third  that  AND/OR  trees  are 
not  complete  in  the  sense  of  allowing  the  solution  to  every 
true  problem-solving  sequence  (see  Loveland  1978,  Chapt .  6). 
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Of  course,  if  these  three  beliefs  are  true,  then  something 
is  drastically  wrong  with  natural  deduction.  But  it  is  false 
that  they  are  all  true.  The  terms  'natural  deduction', 
'problem  reduction  format',  and  'AND/OR  tree'  are  technical 
terms  meaning  different  things  to  different  theorists.  I 
have  already  stated  what  I  take  'natural  deduction'  to 
include,  viz.,  Kalish  &  Montague's  system.  And  I  have 
already  stated  what  I  take  'AND/OR  tree'  to  mean.  Given 
these  definitions,  what  are  we  to  say  about  the  above  three 
beliefs?  Two  things,  I  think.  First,  it  must  be  that  the 
term  'problem  reduction  format'  is  being  used  equivocally. 
The  sense  in  which  it  is  the  same  as  natural  deduction  is 
just  not  the  same  sense  as  that  in  which  it  is  the  same  as 
AND/OR  trees  (as  defined  earlier).  A  second  thing  that  might 
be  said  is  that  the  explanation  of  AND/OR  trees  given  is  too 
restrictive,  and  that  a  correct  explanation  will  allow  them 
to  completely  describe  the  search  space. 

Let  us  look  at  these  alternatives.  There  is  a 
well-established  sense  in  which  the  problem  reduction  format 
just  is  the  AND/OR  trees  as  defined.  To  reduce  a  problem 
means  to  break  it  into  simpler  problems  such  that  the 
solving  of  all  the  simpler  problems  is  equivalent  to  solving 
the  original  (AND  node)  or  the  solving  of  one  of  the  simpler 
problems  entails  the  solving  of  the  original  (OR  node).  In 
this  sense  of  'problem  reduction  format',  natural  deduction 
of  the  sort  here  described  properly  includes  it  but  is  not 
the  same  as  it.  For,  only  the  SPLITTING  heuristics  have 


this 
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property.  In  particular,  the  "reductio"  method  of  boxing  and 
and  cancelling  cannot  be  put  in  this  framework.  In  this 
sense  of  ’problem  reduction  format’,  natural  deduction  is 
not  equivalent  to  problem  reduction.  But  there  is  another 
sense  of  ’problem  reduction  format'  in  which  any  method  that 
includes  the  use  of  AND/OR  trees  is  a  problem  reduction 
format  method.  In  this  sense  natural  deduction  is  equivalent 
to  problem  reduction.  But  obviously  then  the  second  belief 
is  wrong:  problem  reduction  is  not  then  equivalent  to  AND/OR 
tree  representation. 

Perhaps,  though,  one  should  expand  what  an  AND/OR  tree 
is.  The  real  problem  with  allowing  AND/OR  trees  to  represent 
problems  is  that  one  only  allows  nodes  to  represent  ’’simple 
states’’,  and  one  represents  the  kinds  of  relationships 
between  these  "simple  states"  by  the  kind  of  node  (AND  or 
OR).  But  this  means  that  the  only  types  of  assertion  we 
allow  will  be  either  (  (A, &A2& .  .  .&An)->B)  or 
(A,-*B)&(A2->B)&.  .  . &(An+B)  .  And  the  only  rule  of  inference 
recognized  is  MP  in  the  following  form:  if  one  can  solve  all 
(AND  node)  or  some  (OR  node)  subproblem,  then  one  has  solved 
the  problem.  Clearly  not  every  logically  valid  formula  can 
be  represented  in  this  way  (there  are  no  negations)  nor  can 
every  valid  argument  be  solved  this  way. 

Indeed,  the  AND/OR  representation  is  equivalent  to  Horn 
sets.  And  we  have  seen  above  that  only  some  of  the  arguments 
we  wish  to  represent  can  be  put  in  this  way.  Loveland's 
(1978:  Chapt.  6)  solution  to  this  is  to  expand  the 
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definition  of  AND/OR  tree  so  as  to  allow  negated  nodes  of  an 
AND/OR  tree  and  to  claim  that  a  branch  of  the  tree  is  solved 
if  it  contains  both  a  simple  node  and  its  negation.  This  in 
effect  allows  all  formulae  (since  they  can  all  be  defined  in 
terms  of  &,  and  -1  of  atomics),  and  admits  all  proofs  (by 
the  addition  of  the  "reductio"  proof  method).  An  alternative 
approach  would  be  to  allow  arbitrarily  complex  nodes  (any 
truth  functions  or  quant i f icat ional  formula)  and  note  that 
branches  like 

C 

(A&-A) 

are  automatically  valid.  (One  needn’t  use  indirect  proof 
here,  MP  and  conditional  proof  will  suffice).  Or  one  might 
incorporate  both  devices  into  a  larger  system,  as  is  done  in 
the  theorem  prover  that  will  be  explained  in  the  next  few 
chapters.  What  this  shows  is  that  it  was  not  natural 
deduction  which  was  suspect,  but  rather  that  the  traditional 
AND/OR  tree  representation  is  not  an  appropriate  way  to 
describe  the  problem  reduction  format  that  natural  deduction 
employs . 
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V.  THE  GUTS:  DATA  STRUCTURES  AND  LOW  LEVEL  ROUTINES 


A.  Introduction 

In  this  chapter  I  discuss  the  underlying  data 
structures  and  low-level  subroutines  used  by  the 
theorem-proving  system  THINKER.33  The  next  chapter  will  be 
devoted  to  a  discussion  of  the  high-level  heuristics  that  do 
the  actual  driving  of  THINKER.  There  is  one  part  of  the  data 
structures  whose  discussion  will  be  postponed  until  Chapter 
VII,  when  I  talk  about  difficulties  that  THINKER,  as 
described  here,  has  in  proving  certain  "tricky  theorems". 

The  source  language  of  THINKER  is  the  SPITBOL  dialect 
of  SNOBOL4 .  This  is  in  keeping  with  my  general  attitude 
(expressed  in  Chapter  IV)  that  "real  logic"  is  a  matter  of 
pattern  matching  amongst  strings,  as  opposed  to  an  attempt 
to  try  to  "understand"  the  problem  posed  and  make  use  of 
that  "knowledge"  to  infer  the  truth  of  some  other  item. 
SNOBOL  is  a  string  manipulation  language  which  allows  one  to 
define  "patterns"  as  they  occur  in  strings.  As  we  shall  see, 
it  is  these  patterns  that  we  attend  to  in  constructing  a 
proof.  SNOBOL  also  has  a  built-in  data  type  of  TABLE,34 
which  makes  use  of  a  hash  function  to  index  storage 

3  3 These  lower  level  routines  and  data  structures  were 
designed  and  developed  by  Dan  Wilson.  We  have  discussed,  on 
numerous  occasions,  the  issues  involved  with  THINKER  and  the 
exact  form  of  these  underlying  structures.  A  paper  we 
jointly  wrote  on  the  topic  was  presented  to  an. inter-, 
disciplinary  conference  in  January  1982  in  Fiji,  and  is  to 
be  published  as  Pelletier  &  Wilson  (1982). 

34  Tables  may  be  thought  of  as  content  addressable  memories. 
See  Griswold  et  al  (1968)  for  further  definition  of  SN0B0L4 . 
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locations  by  means  of  strings.  THINKER  often  asks  the 
question  "Does  formula  fX’  occur  thus  far  in  the  proof?", 
and  so  one  wants  an  efficient  way  to  find  this  out.  If  ’X' 
is  stored  as  a  string,  then  one  can  use  a  TABLE  to  look  up 
where  'X'  has  occurred.  Furthermore,  by  means  of  the  DATA 
construct,  one  can  build  data  types  of  arbitrary  complexity; 
and  as  we  shall  see,  this  is  used  quite  extensively  in 
THINKER.  Control  in  SNOBOL  is  rather  primitive;  it  is 
sequential  with  branching  allowed  on  the  Success  or  Failure 
of  a  pattern  match.  Of  course,  since  patterns  can  be  quite 
complex  (including  recursive  patterns),  many  different 
things  can  happen  in  the  course  of  finding  out  whether  a 
particular  pattern  has  matched  or  failed  to  match  a  given 
string.  SNOBOL  allows  unrestricted  recursion  --  a  feature 
the  heuristics  make  heavy  use  of.  As  far  as  I  am  aware,  the 
only  feature  of  SPITBOL  that  THINKER  uses  which  is  not  also 
a  feature  of  SN0B0L4  is  the  function  LEQ  ("lexically 
equal " ) . 

B.  Explicit  Manipulation  of  Formulae 

As  hinted  in  the  last  paragraph,  THINKER  stores  its 
formulae  as  strings  rather  than  the  list  structures  or  tree 
structures  of  other  theorem  provers.  Strings  are  easier  to 
copy  and  manipulate  than  are  lists  and  trees.  There  is  a 
pattern  BF  which  is  a  recursive  definition  of  well-formed- 
f ormula ; ,  it  succeeds  if  the  formula  entered  as  a  premise  or 
conclusion  is  well-formed.  If  not,  a  correction  is 


requested.  All  the  other  patterns  assume  that  the  formula 
being  operated  on  is  well-formed.  There  are  three  further 
patterns  USPLIT,  QSPLIT,  and  BSPLIT,  which  (given  a  formula) 
locate  the  main  connective  (  ’  -1 '  for  USPLIT ,  the  quantifier 
for  QSPLIT,  or  the  binary  connnective  for  BSPLIT).  This  is 
very  fast  to  do  in  SPITBOL:  since  we  assume  the  formula  to 
be  well  formed,  it  is  merely  a  matter  of  "balancing” 
parentheses  by  the  primitive  function  BAL.  The  results  of 
this  pattern  match  are:  (a)  if  it  fails,  the  formula  was 
atomic,  (b)  if  it  succeeds,  then  the  main  connective  is 
stored  in  the  global  variable  OP,  (c)  if  it  is  a  BSPLIT,  the 
global  variables  LOP  and  ROP  contain  the  two  subformulae, 

(d)  if  it  is  either  USPLIT  or  QSPLIT,  then  ROP  contains  the 
(unnegated  or  unquantified)  formula,  and  (e)  if  it  is 
QSPLIT,  then  the  global  variable  VAR  contains  the  variable 
of  quantification.  The  three  patterns  are  disjunctive  parts 
of  the  pattern  SPLIT  which  applies  the  three  in  turn  until 
one  succeeds  or  they  all  fail.  It  is  in  this  way  that 
THINKER  finds  out  "what  kind"  of  formula  it  is  currently 
operating  upon. 

The  next  kind  of  data  structure  is  the  occurrence 
table.  There  are  two  types:  general  and  specific  occurrence 
tables.  These  tables  record  information  about  the  occurrence 
of  terms  (variables  and  constants  --  THINKER  does  not  have 
arbitrary  function  terms)  in  the  proof  being  constructed. 
These  are  TABLES,  and  so  indexed  by  the  term  in  question.  In 
a  general  table,  the  value  stored  at  each  index  is  a  count 
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of  how  many  times  that  term  has  occurred  in  the  proof;  in  a 
specific  table,  the  value  is  a  first  in,  last  out  stack  each 
entry  of  which  is  the  "proof  level"  ( i . e .  ,  how  many 
embeddings  of  uncancelled  ’show’  are  there  currently)  at  the 
time  when  the  term  was  added  to  the  table.  The  functions  PPC 
and  PFV  add  and  delete  from  general  or  specific  tables 
respectively.  There  are  the  following  occurrence  tables. 

CLIST  --  a  general  table  of  all  occurrences  of  currently 
antecedent  constants  (including  those  in  premises) 

VLIST  --  a  general  table  of  all  occurrences  of  all 
variables 

FVLIST  --  a  specific  table  of  all  occurrences  of 
currently  antecedent  free  variables 
PremVLIST  --  a  general  table  of  all  occurrences  of 
variables  in  premises 

PremFVLIST  --  a  specific  table  of  all  occurrences  of 
free  variables  in  premises 

Various  patterns  are  used  to  get  information  about  a  given 
formula,  and  many  of  these  patterns  have  side  effects  of 
entering  and  deleting  things  into  (from)  the  occurrence 
tables  (by  calling  PPC  and  PFV). 

CF  (pattern)  given  an  input  formula  and  term,  produces  a 
list  of  the  positions  of  unbound  instances  of  the  term 
in  the  formula. 

PF  (pattern)  matched  against  a  formula  makes  appropriate 
entries  into  CLIST,  VLIST  and  FVLIST  as  side  effects. 

DF  (pattern)  matched  against  a  formula  makes  appropriate 
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deletions  from  CLIST  and  FVLIST  as  side  effects. 

HF  (pattern)  matched  against  a  formula  (premise)  makes 
appropriate  entries  into  PremVLIST,  PremFVLIST,  and 
CLIST  as  side  effects. 

FF  (pattern)  matched  against  a  formula  produces  the 
general  occurrence  table  FVL  for  all  free  variables  in 
the  formula. 

These  patterns  use  a  recursive  descent  parser  while  adding 
the  relevant  information  to  the  tables. 

C.  The  Proof  Matrix 

Two  global  variables  are  CURLINE,  the  current  line  of 
the  proof,  and  CURLEVEL ,  the  current  "show  level"  (the  depth 
of  embedding  of  uncancelled  ’show’  lines).  The  proof  matrix 
PRMAT  is  an  n* 6  two-dimensional  array,  where  n  denotes  the 
number  of  lines  in  the  proof.  The  first  four  entries  for 
each  line  are:  the  formula  at  that  line,  the  annotation  of 
that  line  (i.e.,  the  justification  for  that  line  --  numbers 
of  the  antecedent  lines  used  and  the  rule  of  inference 
employed  in  obtaining  that  line),  the  CURLEVEL  of  that  line, 
and  a  boolean  value  indicating  whether  that  line  is  (is  not) 
antecedent.  (When  a  line  is  first  entered  into  PRMAT  then  it 
is  antecedent  if  it  is  not  a  ’show'  line,  but  later 
additions  may  make  it  become  non-antecedent).  The  final  two 
entries  will  be  discussed  in  Chapter  VII;  they  are  not 
required  to  understand  the  standard  operation  of  THINKER. 
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The  function  ADDPROOF (X , Y )  adds  formula  X  with 
annotation  Y  to  PRMAT ,  along  with  the  CURLEVEL  and  the 
correct  boolean  value  (and  computes  the  remaining  two 
fields).  If  the  formula  ADDPROOFed  is  not  a  goal,  ADDPROOF 
calls  the  function  ADDANTE  (discussed  below);  if  it  is  a 
goal,  ADDPROOF  was  called  from  the  function  ADDGOAL 
(discussed  below). 

The  function  SHOWN ( )  backs  through  the  proof  matrix  up 
to  the  last  'show',  altering  (by  DELANTE)  the  boolean  field 
of  each  formula  it  encounters  (making  antecedent  lines 
non-antecedent  and  making  the  'show'  formula  it  encounters 
become  antecedent).  It  simultaneously  makes  changes  to  the 
specific  and  general  occurrence  tables.  It  then  calls 
ADDANTE  on  this  (now  cancelled)  goal  and  pops  the  GOALSTACK 
( see  below ) . 

Finally,  there  is  the  function  PRINTPROOF()  which 
prints  out  the  completely  formatted  proof. 

D.  Antecedent  Lines  and  Templates 

The  data  type  ANTE  is  a  triple:  a  line  number  in  the 
proof  matrix,  the  show  level  of  that  line,  and  a  pointer 
variable  (to  the  next  ANTE).  A  template  is  a  schematic 
representation  of  a  formula,  schematic  in  that  some  feature 
has  been  omitted  and  replaced  by  Thus  'F(x,@>)'  is  a 

string  which  represents  the  class  of  actual  formulae  that 
have  'x'  in  the  'F'  relation  to  something.  ' (@+P) ' 
represents  the  class  of  formulae  that  have  as  main 


connective  and  ' P'  as  consequent.  THINKER  uses  such 
templates  in  the  following  way:  when  a  formula  --  say 
'  (  (A&B)-»-C)  '  is  added  to  the  proof  matrix  and  hence  is 
antecedent,  THINKER  also  remembers  the  templates  it  can  form 
from  it,  here  '  (  ( A&B )-*<5>)  '  and  *  ( @->C )  '  ,  thereby  remembering 
that  ’(A&B)’  is  the  antecedent  of  some  conditional  statement 
and  that  ’C’  is  the  consequent  of  some  conditional 
statement.  THINKER  keeps  templates  for  the  following: 

1.  For  every  formula,  it  forms  all  templates  generated  by 

replacing  exactly  one  free  term  (constant  or  free 
variable)  by  ( ’P? (x,y) ’  therefore  has  two 

templates:  ’P?(@,y)'  and  ’P?(x,@)',  but  no  ’P?(@,@)' 
template ) 

2.  Where  Q  is  a  quantifier,  formulae  of  the  form  '(Qa)<z>a' 
generate  the  template  '(£(?>)  0<3> '  .  (The  replacement  of  '@' 
for  'a'  in  '0a'  observes  scope  requirements,  of  course). 
In  addition  to  this  "quantifier  template"  a  formula  like 
' (Ex)Fxy'  would  generate  the  free  variable  template 
'(Ex)Fx(q)',  naturally. 

3.  A  formula  whose  main  connective  is  binary  is  of  the  form 
( 0  o  ili )  and  generates  the  templates  '(@>°i|j)'  and  '(0°(p>)'. 

No  other  formulae  generate  templates.  One  easily 
distinguishes  templates  from  actual  formulae  by  the  presence 
of  '  (6) '  ;  thus  they  can  be  treated  on  a  par  and  distinguished 
only  when  necessary.  The  data  type  TEMP  has  two  fields:  a 
token  field  and  a  link  to  the  next  TEMP.  The  token  is  what 
the  '  (8> '  is  replacing  in  that  template. 


ANTELINES  is  a  table  whose  index  is  a  formula  or  a 
template  and  whose  value  at  that  index  is  a  stack.  Entries 
in  this  stack  are  either  ANTES  or  TEMPs.  Intuitively,  if  one 
indexes  ANTELINES  by  a  regular  formula,  the  values  will  be  a 
stack  of  places  and  show  levels  of  where  that  formula  occurs 
in  the  proof  matrix,  while  if  one  indexes  ANTELINES  by  a 
template,  one  will  get  a  stack  of  the  tokens  from  which  that 
template  was  generated. 

The  function  FINDANTE(X)  succeeds  if  there  is  an 
antecedent  line  fX';  it  returns  the  ANTE.  The  function 
FI NDTEMP ( X )  succeeds  if  there  is  an  antecedent  line  with  the 
template  ’X';  it  returns  the  actual  formula.  The  functions 
ADDANTE ( X )  and  DELANTE ( X )  add  to  and  delete  from  ANTELINES. 
ADDANTE  stores  the  formula  as  an  ANTELINE,  creates  all  the 
TEMPs,  stores  them  via  ADDTPLATE,  creates  the  free  variable 
templates  and  stores  them,  determines  the  major  connective 
and  CSTACKS  the  formula  (see  below),  and  constructs  all  the 
occurrence  tables  (CLIST,  VLIST,  FVLIST).  DELANTE  takes  out 
everything  ADDANTE  has  put  in,  except  VLIST  never  has 
anything  deleted  (it  was  the  list  of  all  variables  that  ever 
occurred  in  the  proof). 

E.  Goals 

There  is  a  stack  called  GOALSTACK  which  contains  the 
"goals”,  i.e.,  the  formulae  to  be  proved  —  the  'show' 
lines.  (A  stack  is  used  since  THINKER  is  always  trying  to 
prove  the  "most  current"  goal).  Each  element  of  GOALSTACK 


has  seven  fields:  an  index  into  the  proof  matrix  (the  number 
of  the  line  of  the  proof  it  is),  the  "show  level",  the 
actual  formula,  a  table  of  free  variable  templates  for  this 
goal,  two  pointers  for  keeping  track  of  the  location  in  the 
existentially  quantified  SIMPLE  ring  (see  below),  and  a 
pointer  to  the  next  element  of  the  stack.  The  pointers  to 
the  SIMPLEs  are  merely  bookkeeping  pointers  to  make  sure 
that  we  never  do  an  existential  instantiation  of  a 
particular  formula  more  than  once  under  a  given  show  level. 

ADDGOAL ( X )  stacks  X  onto  the  GOALSTACK  and  ADDPROOFs  X. 
TESTGOAL(X)  finds  out  whether  X  is  already  a  goal  (so  as  not 
to  add  it  again  as  a  goal).  GETGOAL()  retrieves  the  formula 
that  is  the  most  recent  goal.  Recall  that  SHOWN()  pops 
(deletes)  the  most  recent  goal  and  ADDANTEs  that  formula. 

F.  SIMPLEs  and  FINDing 

The  data  type  SIMPLE  (for  simplification)  are  stored  in 
a  doubly-linked  circular  list.  Each  simple  contains  four 
fields:  a  formula,  the  number  of  the  proof  line  it  occurs 
on,  a  forward  link  and  a  backward  link.  There  is  one  such 
ring  for  each  of  the  connectives,  so  that  (for  example) 
every  antecedent  formula  that  has  an  as  its  major 

connective  appears  on  the  ' ring.  Formulae  are  added  to 
one  side  of  the  entry  point  to  the  ring  by  means  of  CSTACK 
(recall  that  ADDANTE  calls  CSTACK)  and  are  deleted  from  the 
ring  by  DELCSTACK,  which  is  called  by  DELANTE . 
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These  rings  are  used  by  a  set  of  functions  called 
FINDX,  where  'X'  is  replaced  by  the  name  of  some  rule  of 
inference.  Thus  FINDDNS  ("double  negations")  goes  around  the 
negation  ring;  for  each  formula  it  encounters  it  tests 
whether  that  formula  starts  with  two  negations.  If  it  finds 
one  such,  it  checks  to  see  if  the  un-double-negated  formula 
is  already  antecedent  (if  so,  go  to  next  member  of  the 
ring);  if  it  is  not,  it  ADDPROOFs  the  un-double-negated 
formula,  which  in  turn  CSTACKs  this  new  formula  onto  the 
appropriate  ring  (unless  it  is  atomic),  calls  ONESTEP  on 
this  new  formula  (see  the  next  chapter)  and  if  ONESTEP  fails 
goes  on  to  the  next  formula  in  the  ring,  until  the  ring  has 
been  exhausted.  Of  course  a  quadruply-negated  formula  will 
be  doubly-unnegated  and  CSTACKed  onto  the  end  of  the  same 
ring  as  a  double  negation  by  FINDDNS,  and  eventually  will  be 
completely  unnegated.  In  a  similar  vein,  FINDBCS  circles  the 
ring,  finding  1  ( 0«--*>i|i )  '  .  It  checks  to  see  if  '(0^ili)'  is 
antecedent,  if  not  it  ADDPROOFs  it,  which  in  turn  CSTACKs  it 
onto  the  ring,  and  then  calls  ONESTEP  on  it.  If  this 

fails  it  does  the  same  to  '(t|i->0)'.  FINDCONTRA  goes  around 
the  ’  -*  '  ring  and  looks  to  ANTELINES  for  the  unnegated 
formula.  If  it  finds  it,  it  is  "repeated"  as  the  last  line 
of  the  proof  and  SHOWN ()  is  called.  FINDMPS  goes  around  the 
'  +  '  ring  always  looking  for  whether  the  arrow's  antecedent 
is  in  ANTELINES;  if  so,  it  checks  whether  the  consequent  is 
in  ANTELINES  and,  if  not,  ADDPROOFs  it  and  calls  ONESTEP. 

FI NDALLEI  goes  through  the  existentially  quantified  ring  and 


performs  an  existential  instantiation  each  to  a  new  variable 
(if  this  has  not  already  been  done  as  an  antecedent  line). 
The  results  are  CSTACKed,  but  ONESTEP  is  not  called  because 
it  cannot  succeed.  (The  new  formula  has  variables  distinct 
from  any  in  the  proof,  so  it  cannot  immediately  yield  the 
latest  'show').  FINDUI  goes  around  the  universally 
quantified  ring  and  instantiates  to  everything  on  the 
FVLIST,  the  PremFVLIST,  and  the  CLIST.  If  these  lists  are 
empty,  another  strategy  is  used  (see  the  next  chapter).  The 
other  FINDS,  e.g.,  FINDANDS  and  FINDMTPS  work  similarly. 

G.  TESTing  and  SEARCHing 

TESTUI (X)  and  TESTEG(X)  succeed  if  the  current  goal 
could  come  from  X  by  universal  instantiation  or  existential 
generalization  respectively.  Thus,  if  ' (Ex ) (Fx&Gx ) ’  is  the 
current  goal  then  TESTEG ( ’ ( Fa&Ga ) ' )  succeeds;  but 
TESTEG( ’ (Fa&Gb) ' )  would  fail.  Note  that  if  the  goal  is 
' (Ex) (Fx&Ga) T ,  then  TESTEG ('( Fa&Ga )’ )  succeeds. 

One  of  the  strategies  used  by  THINKER  requires  it  to  be 
able  to  locate  negations  of  biconditionals,  of  conditionals, 
and  of  disjunctions.  The  function  SEARCHNEGS  goes  around  the 
1  -* 1  ring  checking  whether  there  are  any,  and  returns  that 
formula  unnegated  (unless  it  is  already  a  goal,  in  which 
case  it  continues  around  the  ring).  Another  of  the 
strategies  involves  looking  for  a  conditional  that  does  not 
also  have  its  consequent  as  an  antecedent  line.  SEARCHARROW 
goes  through  the  ring  looking  for  this  and  returns  it. 
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Similarly  SEARCHWEDGE  looks  for  a  disjunctive  formula  where 
neither  disjunct  has  occurred  as  an  antecedent  line.  (There 
are  further  conditions  on  these  functions  which  will  be 
discussed  in  the  next  chapter). 

This  completes  discussion  of  the  low-  and 
intermediate-level  functions,  patterns  and  data  types  used 
by  THINKER.  In  the  next  chapter  I  discuss  how  these 
lower-level  functions  are  employed  by  the  heuristics. 


VI.  THE  BRAINS:  HEURISTICS 


There  are  two  structurally  distinct  parts  to  a  Kalish  & 
Montague  proof:  the  'show'  lines  and  a  sequence  of 
antecedent  lines.  As  indicated  in  Chapter  V,  these  are  kept 
separate  in  THINKER  by  two  different  data  structures:  the 
GOALSTACK  and  the  ANTELINES  table.  In  a  Kalish  &  Montague 
proof,  one  is  always  working  at  proving  the  most 
recently-added  goal,  by  adding  more  and  more  antecedent 
lines  until  a  certain  configuration  of  these  lines  is 
attained.  When  this  happens,  that  goal  is  proved  --  it 
becomes  an  ANTELINE  (i.e.,  gets  "cancelled")  and  all  lines 
which  had  been  added  to  ANTELINE  after  the  goal  was  added  to 
GOALSTACK  get  deleted  (i.e.,  get  "boxed").  When  the 
f i r st-to-be-added  goal  becomes  antecedent,  the  proof  is 
finished.  So  when  an  argument  to  be  proved  is  entered,  the 

premises  (if  any)  are  stored  in  the  premise  table,  and  the 

goal  is  put  on  GOALSTACK.  In  the  normal  course  of  events, 
the  premises  do  not  immediately  yield  a  proof  of  the  goal. 
Whenever  THINKER  cannot  immediately  prove  a  goal  (by  methods 
given  below),  it  has  a  choice  based  on  the  main  connective 

of  the  formula  to  be  proved:  it  can  either  make  an 

assumption  or  else  it  can  set  up  one  or  more  subsidiary 
goals.  Although  GOALSTACK  is  a  stack,  THINKER  makes  sure 
that  a  proposed  new  goal  is  not  identical  with  one  which  is 
already  active  (by  sequentially  going  through  GOALSTACK 
using  TESTGOAL).  If  the  goal  is  already  active,  it  uses  a 
different  strategy  (normally,  to  make  an  assumption).  If 
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THINKER  is  allowed  to  set  a  new  goal,  it  uses  the  following 
strategy:35  If  the  main  connective  is  it  sets  itself 

the  two  subgoals  and  to  be  proved  independently. 

When  they  are  proved  (and  hence  antecedent),  THINKER  uses 
the  rule  CB  ("Conditionals  to  Biconditional")  to  establish 
the  formula,  and  this  allows  the  "cancelling"  of  the 

original  goal.  Similarly,  if  the  goal’s  main  connective 

is  it  sets  as  subgoals  each  conjunct.  After  proving 

them,  it  uses  the  rule  Adj  ("Adjunction")  to  establish  the 
formula,  which  allows  the  "cancellation"  of  the  original 
goal.  If  the  formula  has  a  universal  quantifier  as  its 
main  connective  and  the  variable  of  quantification  is  not 
free  in  any  antecedent  line  (THINKER  looks  to  the  occurrence 
tables  for  this  information),  then  it  sets  the  unquantified 
formula  as  a  goal.  If  it  proves  this,  then  since  the 
variable  (not  being  free  in  antecedent  lines)  was  chosen 
"arbitrarily",  the  original  universally  quantified  formula 
goal  is  "cancelled"  and  becomes  antecedent. 

Those  are  the  only  splitting  heuristics,  but  there  are 

other  ways  for  THINKER  to  set  up  subgoals.  But  before 

discussing  them,  let  us  look  at  the  other  option  open  to 

THINKER  —  the  making  of  assumptions.  If  the  goal  formula 

has  any  connective  other  than  the  foregoing,  or  if  the  above 

subgoals  are  already  active,  or  if  the  formula  is  atomic, 

then  (subject  to  the  two  provisos  to  come)  THINKER  will  make 

an  assumption.  The  Kalish  &  Montague  rules  for  assumption 

3  5  Some  of  these  strategies  are  identical  with  Bledsoe's 
(1971)  "splitting  heuristics". 
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making  (see  Chapter  II)  allow  one  always  to  assume  a 
negation  or  to  assume  the  antecedent  (if  the  goal  is  a 
conditional).  THINKER  does  this  as  follows  (subject  to  the 
two  provisos):  (a)  if  the  goal  formula  is  a  conditional,  the 
next  line  will  be  the  antecedent  of  that  goal  (with  the 
annotation  'ASSUME');  (b)  if  the  goal  formula  is  a  negation, 
the  next  line  will  be  the  unnegated  formula  (with  annotation 
'ASSUME');  (c)  in  all  other  cases  the  next  formula  is  the 
negation  of  the  goal  (with  the  annotation  'ASSUME').  All 
these  assumptions  are  of  course  added  to  ANTELINES. 

The  two  provisos  to  this  assumption-making  are:  except 
for  one  special  circumstance  which  is  not  relevant  here,  no 
line  that  is  already  antecedent  can  be  entered  again  in  the 
proof.  So  if  the  would-be  assumption  is  already  antecedent 
it  will  not  be  made.  Secondly,  recall  that  if  '  ( 0-^d« )  '  is  a 
goal,  it  can  be  cancelled  when  '  d)  ’  occurs  --  regardless  of 
whether  '0'  has  been  assumed.  Thus,  before  such  a  '0'  is 
assumed,  a  quick  check  is  made  of  whether  '  di '  can  be  derived 
by  SIMPLEPROOF  ( iii )  ,  for  which  see  below.  If  this  fails,  then 
'0'  is  assumed. 

Certain  parts  of  a  proof  are  easier  to  complete  than 
others.  When  the  proof  has  progressed  so  far  that  just  one 
more  step  is  required,  it  is  easy  to  find  that  missing  step, 
since  there  are  but  a  small  number  of  possibilities.  In  a 
resolution  system  one  would  look  for  two  clauses  of  length 
one  that  are  complementary.  In  THINKER  there  are  more 
possibilities  because  there  is  more  than  one  rule.  In 
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general,  THINKER  knows  the  goal  to  be  proved  and  needs  then 
to  search  for  antecedent  lines  which  in  one  step  will  yield 
the  goal.  For  many  of  the  rules  of  inference  (MP,  MTP ,  MT) 
this  requires  finding  pairs  of  antecedent  lines,  and  hence 
is  theoretically  an  n2  problem.  But  there  are  ameliorations 
to  theory.  Thus  if  0  is  the  goal  and  we  want  to  know  whether 
it  comes  by  MP,  we  need  only  see  if  ANTELINES  contains  the 
template  '(§^0)’.  Since  this  is  done  by  hashing  it  is  quite 
quick.  If  it  succeeds,  we  look  at  the  token  of  ’(S'  and  hash 
again  to  see  if  it  also  is  in  ANTELINES.  The  efficiency  of 
this  then  depends  on  the  size  of  the  ANTELINES  table  in 
comparison  to  how  full  it  is;  in  SPITBOL  the  size  of  a  table 
changes  dynamically,  so  that  when  it  gets  too  full  for 
efficient  search,  it  is  expanded.  All  of  the  rules  of 
inference  can  be  seen  to  operate  in  this  manner;  so,  given 
that  one  knows  what  formula  is  to  be  proved,  the  template 
device  can  very  quickly  check  whether  it  can  be  done  in  one 
step  from  some  rule  of  inference.  This  is  THINKER'S  function 
SIMPLEPROOF ( 0 ) ,  which  attempts  to  find  a  one-step 
justification  of  '0'.  If  it  succeeds,  '0'  will  be  entered 
into  the  proof  matrix  as  the  latest  ANTELINE.  Generally,  the 
point  of  this  is  that  having  '0'  in  the  proof  allows  one  to 
cancel  the  most  current  goal.  In  the  example  mentioned  above 
(under  "provisos"),  ’  (0-mIj)'  was  the  most  recent  goal  and  can 
be  cancelled  if  T  ill '  occurs  (unboxed)  beneath  it.  Thus 
THINKER  calls  SIMPLEPROOF  ( ill )  which,  if  successful,  adds  ifc 
and  thereby  allows  the  goal  to  be  cancelled. 
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One  special  type  of  SIMPLEPROOF  occurs  so  commonly  that 
it  was  thought  worthwhile  to  try  to  speed  it  up  even  more  by 
writing  it  as  a  separate  function.  The  case  is  that  we  wish 
to  prove  the  most  recent  goal  and  we  wish  to  know  whether 
the  last  line  we  added  to  ANTELINES  was  the  information 
necessary  to  do  it.  So  here  we  not  only  know  what  is  to  be 
proved,  but  also  know  one  of  the  formulae  to  use  in  doing 
the  inference.  If  the  goal  is  0  and  the  last  line  is  i|i  ,  we 
can  cut  out  some  of  the  hashing  required  by  SIMPLEPROOF  by 
(a)  directly  seeing  if  0  can  come  from  by  a  one-premise 
rule  of  inference,  or  (b)  seeing  if  a  particular  line  (not  a 
template)  is  in  ANTELINES  for  two-premise  rules  of 
inference.  Suppose  the  goal  is  0  and  the  last  line  is  <1) .  Now 
see  if  0  =  di ;  if  not,  see  if  -,di  is  an  ANTELINE;  if  not,  check 
the  main  connective  of  i|i  .  If  it's  a  universal  quantifier, 
call  TESTUlU)  (see  Chapter  V)  ;  if  it  is  a  ' see  if  one 
of  the  conjuncts  is  0;  if  it  is  a  double  negation,  see  if  0 
is  the  unnegated  formula;  if  0  is  existentially  quantified, 
call  TESTEGU)  (see  Chapter  V);  if  0  has  '  +  '  as  its  main 
connective,  see  if  di  is  one  of  the  disjuncts.  No  one-premise 
rule  would  require  lookup  in  ANTELINES  for  this  type  of 
function,  but  rather  success  can  just  be  checked  by  patterns 
on  two  given  strings.  The  two-premise  rules  do  require  some 
lookup,  but  not  much.  For  example,  if  0  is  the  goal  and  iii 
the  last  line,  and  if  di  is  of  the  form  (X-*0),  we  look  for  X 
in  ANTELINES;  if  <|i  is  of  the  form  (X  +  0)  we  look  for  “,X  in 
ANTELINES;  if  i|i  is  of  the  form  ^X  we  look  for  (X  +  0)  or 
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(-’0^X)  in  ANTELINES .  And  so  on  for  all  the  rules.  This 
special  case  of  SIMPLEPROOF  is  called  ONESTEPU),  where  0  is 
the  line  to  be  used  in  constructing  a  one-step  proof  of  the 
most  recent  goal.  Keep  in  mind  that  the  argument  to 
SIMPLEPROOF  is  the  formula  to  be  proved  (which  may  not  be 
the  most  recent  goal),  while  the  argument  to  ONESTEP  is  the 
formula  to  use  in  proving  the  most  recent  goal. 

ONESTEP  is  called  each  time  a  new  line  is  added  to  the 
proof  by  the  FIND  routines  described  in  the  previous 
chapter,  or  when  an  assumption  is  made,  or  when  a  ’show' 
line  is  cancelled.  The  first  of  these  three  cases  is  clear: 
when  a  new  line  is  added,  see  if  that  new  line  will  prove 
the  most  recent  goal.  The  other  two  cases  require  comment. 

If  an  assumption  is  made,  it  is  either  the  assumption  of  the 
antecedent  of  a  conditional  'show'  line  or  else  the  negation 
of  the  'show'  line.  In  either  case,  it  is  possible  (although 
unlikely)  that  the  'show'  line  itself  can  be  directly 
obtained  from  the  assumption.  In  the  former  case,  suppose 
the  'show'  line  is  ' (p^q) '  and  THINKER  assumes  'p'.  If 
'  ( p-* ( p->q )  )  '  is  an  antecedent  line,  then  ONESTEP('p')  will 
enter  the  line  '(p->q)'  and  cancel  the  'show'.  In  the  latter 
case,  suppose  the  'show'  line  is  p '  and  THINKER  assumes 
'-,p'.  If  '("ip-*p)'  is  an  antecedent  line  then  ONESTEP ('  "'p '  ) 
will  enter  'p'  and  cancel.  The  introduction  of  'show'  lines 
by  THINKER  is  rigidly  controlled  with  an  eye  towards  proving 
something  that  will  yield  an  immediately  useful  line.  (See 
below  for  exceptions  to  this.)  Thus  whenever  a  'show'  line 
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is  cancelled  it  behooves  THINKER  to  see  if  ONESTEP  on  that 
’show’  line  will  yield  the  next  higher  goal. 

TRYRULES  is  a  "blind"  procedure  which  will  attempt  to 
apply  existential  instantiation,  quantifier  negation,  and 
the  propositional  rules  of  inference  to  the  antecedent 
lines.  Each  time  a  line  X  is  added  by  TRYRULES,  ONESTEP('X') 
is  called,  which  is  one  way  to  terminate  TRYRULES.  Another 
way  to  terminate  it  is  if  an  antecedent  line  is  added  to 
which  heuristic  TRYNEGFLA  (below)  is  applicable.  It  should 
be  noted  that  the  existential  quantifier  rule  (El)  is 
applied  only  once  under  any  ’show’  level;  thus  if  it  has 
already  been  applied  and  its  results  have  not  been  "boxed 
away",  then  it  will  not  be  called  again,  no  matter  how  many 
further  levels  of  ’show’  are  introduced.  Of  course,  TRYRULES 
(and  SIMPLEPROOF,  for  that  matter)  might  introduce  new 
antecedent  lines  to  which  TRYRULES  reapplies.  The  mechanism 
for  this  was  discussed  in  Chapter  V:  TRYRULES  calls  the  FIND 
procedures  and  hence  the  connective  rings.  Universal 
instantiation  is  rigidly  controlled,  the  overall  idea  being 
that  universally  quantified  lines  will  be  instantiated  only 
to  specific  terms.  In  Chapter  VII,  under  the  heading  "the 
EI/UI  problem",  I  shall  discuss  an  improvement  to  the  method 
discussed  here;  but  for  now  we  shall  say  that  a  universally 
quantified  antecedent  line  is  instantiated  (a)  to  every 
constant  in  the  proof  or  premises  (the  CLIST  table),  (b)  to 
every  free  variable  in  an  antecedent  line  (the  FVLIST 
table),  (c)  if  the  CLIST  and  FVLIST  tables  are  empty,  to  the 


- 

* 


128 


variable  of  quantification  of  every  universally  quantified 
'show'  line.  And  finally,  if  (a),  (b)  and  (c)  are  not 
applicable,  then  some  variable  is  arbitrarily  picked. 

TRYNEGFLA  is  a  rather  clever  strategy  which,  when  more 
direct  approaches  fail,  will  search  ANTELINES  for  an 
occurrence  of  the  negation  of  a  conditional  and  add  the 
unnegated  conditional  to  the  GOALSTACK.  (The  strategy  is 
that  if  this  new  goal  can  be  proved,  then  it  will  contradict 
the  negated  conditional  and  allow  the  cancelling  of  the  next 
higher  goal.  Generally  speaking,  proving  a  conditional  is 
easy  since  one  gets  to  make  an  assumption  of  the 
antecedent.)  Finding  the  relevant  negated  formula  is  done  by 
the  SEARCHNEGS  procedure  discussed  in  Chapter  V.  There  are 
similar  strategies  for  negated  biconditionals  and  negated 
disjunctions:  if  THINKER  finds  a  negated  biconditional  it 
tries  to  prove  the  unnegated  biconditional,  if  THINKER  finds 
a  negated  disjunction  it  tries  to  prove  one  of  the 
disjuncts.  Of  course  this  strategy  only  works  if  one  has 
assumed  the  negation  of  some  (true)  goal.  So  there  is  a 
global  variable  which  keeps  track  of  whether  such  an 
indirect  proof  has  been  started  and  only  attempts  TRYNEGFLA 
if  it  has  been. 

TRYCHAINING  is  a  strategy  which  is  THINKER'S 
implementation  of  forward  chaining.36 

36This  is  not  really  either  of  the  usual  types  of  chaining: 
forward  or  backward.  It  is  not  backward  chaining,  because  in 
that  strategy  we  know  what  the  present  goal  is,  say  t|i ,  and 
on  that  basis  set  the  goal  (0-»-i|i).  But  in  THINKER'S  chaining 
no  attention  is  paid  to  the  current  goal.  It  is  closer  to 
forward  chaining,  although  still  not  identical  to  it.  In 
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When  other  strategies  fail,  THINKER  looks  for  a  conditional 
in  ANTELINES  for  which  the  consequent  is  not  also  in 
ANTELINES .  If  it  finds  one  such,  it  adds  the  antecedent  to 
the  GOALSTACK  and  recursively  calls  the  whole  set  of 
heuristics  on  it.  If  it  successfully  proves  it,  then  it  can 
do  a  FINDMP  which  it  could  not  do  before,  and  so  it  again 
can  TRYRULES.  There  is  a  similar  strategy  involving  FINDMT, 
where  THINKER  looks  for  a  conditional  whose  antecedent  does 
not  already  occur  negated  as  an  antecedent  line.  In  this 
case  it  will  add  the  negation  of  the  consequent  to  the 
GOALSTACK,  and  if  it  can  prove  it  then  it  can  again  TRYRULES 
and  be  guaranteed  of  at  least  one  new  line  by  FINDMT. 

Another  version  of  this  strategy  involves  disjunctions  and 
FINDMTP ,  where  THINKER  looks  to  ANTELINES  to  find  a 
disjunction  for  which  neither  disjunct  occurs  as  an 
antecedent  line.  If  THINKER  finds  one,  then  it  will  add  the 
negation  of  one  of  the  disjuncts  to  the  GOALSTACK.  If  it 
proves  this,  then  TRYRULES  will  yield  a  new  line  by  FINDMTP. 

When  THINKER  fails,  either  because  the  formula  to  be 
proved  is  not  a  theorem  or  else  its  heuristics  are 
inadequate  to  prove  it,  it  displays  the  proof  as  thus  far 
constructed  and  requests  the  user  to  enter  another  line 

3 6 ( cont ' d ) c lass ical  forward  chaining,  we  have  0  and  try  to 
prove  as  a  new  goal  (0->i|)),  for  some  arbitrary  ill  which  is  not 
necessarily  a  goal.  Thinker  on  the  other  hand  finds  (0-m1j)  in 
its  antecedent  lines  and  sets  0  as  a  new  goal,  again  without 
attempting  to  determine  whether  i|j  will  aid  it  in  proving  the 
current  goal.  In  the  same  sense  that  FINDMP  and  FINDMT  are 
"forward  chaining"  rules,  so  is  the  present  strategy. 

Indeed,  this  is  merely  Modus  Ponens  with  an  intermediate 
step  of  first  proving  the  antecedent. 
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(either  an  ANTELINE  or  a  new  goal).  It  adds  this  to  the 
proof  with  an  appropriate  annotation,  and  once  again 
attempts  the  proof.  With  this  facility  in  tandem  with  the 
"mathematically  oriented"  natural  deduction  system,  it  would 
seem  that  the  sort  of  man-machine  interaction  that  so  many 
researchers  have  touted  is  possible  (see  the  discussion  in 
Chapter  I  concerning  Fred  Faculty,  Jr.).  I  shall  not  pursue 
this  aspect  any  further  at  this  point,  but  wish  to  emphasize 
the  promise  that  this  holds. 

The  overall  monitor  of  all  these  other  heuristics  is 
PROOF.  It  adds  goals,  calls  the  lower-level  heuristics  as 
necessary,  and  recursively  calls  itself  as  new  goals  are 
added.  And  when  it  gets  "stuck",  it  calls  the  above 
mentioned  HELP  routine.  A  more  detailed  explanation  in 
pidgin  ALGOL  is  given  in  Appendix  II. 

I  give  here  a  moderately  simple  propositional  calculus 
example  which  illustrate  how  THINKER  works.  The  example  is 
to  show  the  equivalence  between  disjunction  and  the 
conditional:  (  (P+Q  )<--►( -’P+Q)  )  .  This  is  put  on  the  goal  stack. 
Since  it  is  a  biconditional,  PROOF  adds  a  conditional  to  the 
goals:  (  (P+Q)+(  -P+Q)  )  ,  and  calls  itself  recursively.  First  a 
call  to  SIMPLEPROOF(  '  (  ( P+Q )  +  ( “’P+Q )  )  '  )  is  made,  which  fails. 
Next  follows  a  call  SIMPLEPROOF ( ' ( -P+Q ) ' ) ,  which  also  fails. 
Then,  since  this  is  a  conditional,  it  assumes  (P+Q)  and 
asks  whether  ONESTEP ( ’ ( P+Q ) ' )  will  prove  the  most  recent 
goal.  The  answer  is  no,  so  it  asks  whether 
SIMPLEPROOF  (  '  (-P+Q)  '  )  .  Again  no,  so  it  adds  '(-P+Q)'  as  a 
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goal,  recursively  calls  itself,  and  will  eventually  (after  a 
SIMPLEPROOF  call)  assume  '-’P'.  It  now  calls  ONESTEP (  ’  -’P ’  )  to 
see  whether  '("’P+Q)'  is  derivable  from  the  ANTELINES .  The 
answer  is  no,  so  it  calls  SIMPLEPROOF (’ Q ? ) .  This  succeeds 
(from  the  ANTELINES  '  (P+Q)  '  and  ?_1P?  by  MTP)  and  so  it  adds 
’Q'  to  ANTELINES,  and  then  notices  it  can  cancel  the  (-,P+Q) 
goal.  So  it  does  (and  deletes  the  ' -’P'  ANTELINE)  ,  thus 
ending  that  recursive  PROOF  call.  But  this  proves 
(  (P+Q)-*  ( ""P+Q)  )  ,  so  it  cancels  that  goal  (and  deletes  the 
(P+Q)  ANTELINE).  PROOF  then  decides  that  to  prove  the 
original  goal,  it  needs  to  prove  (  ( ^P+Q)+  ( P+Q)  )  ,  so  this  is 
added  to  the  goals,  PROOF  is  recursively  called  and  assumes 
(-’P+Q).  ONESTEP(  '  (-’P+Q)  '  )  fails,  as  does 

SIMPLEPROOF (’ (P+Q) ’ ) .  So  ’(P+Q)'  is  added  to  the  goals,  and 
PROOF  called  recursively.  It  assumes  '•’(P+Q)'.  At  this  stage 
the  proof  looks  like 


1.  Show  (  (P+Q) «-+(-■  P+Q)  ) 

2.  *Show  ( (P+Q)+(-P+Q) ) 

3.  (P+Q)  Assume 

4.  *Show  (■•P+Q) 

5.  -'P  Assume 

6 .  Q  3 , 5MTP 

7.  Show  ( (-P+Q)+(P+Q) ) 

8.  (-’P+Q)  Assume 

9.  Show  (P+Q) 

10.  -"(P+Q)  Assume 

No  rules  of  inference  apply  to  our  three  ANTELINES  (#2,  8, 
10).  PROOF  notices  that  line  10  is  the  negation  of  a 
disjunction  and  so  calls  TRYNEGFLA,  which  adds  'P'  as  a  goal 
and  calls  PROOF  recursively.  '  P ’  is  assumed,  and  a  MP  is 
performed  with  the  lines  8  and  12,  yielding  ’Q  . 
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ONESTEP('Q’)  is  called  and  succeeds,  since  by  doing  ADD  on 
it  we  generate  a  contradiction  and  hence  can  cancel  our  most 
recent  goal.  Having  proved  P  and  thus  adding  it  to 
ANTELINES,  PROOF  notes  that  ONESTEP(’P')  allows  us  to  prove 
the  most  recent  goal  (by  ADD).  But  this  most  recent  goal, 
’(P+Q)',  was  the  consequent  of  the  previous  goal,  and  so 
that  goal  too  is  proved.  We  are  now  on  the  topmost 
recursion,  trying  to  prove  line  1,  and  we  have  two  ANTELINES 
(#2,  7).  We  apply  the  rule  CB  to  give  us  the  final  line  #17, 
which  allows  us  to  cancel  line  1,  finish  and  print  out  the 
proof . 


1  . 
2. 

3. 

4. 

5. 

6. 

7. 

8. 

9. 

10. 
1  1  . 
12. 

13. 

14. 

15. 

16. 
17. 


*Show  (  (P+Q)^-*  (  -P+Q)  ) 

*Show  (  (P+Q)+(-P+Q)  ) 

(P+Q)  Assume 

*Show  (  -> P-^Q ) 

-P  Assume 

Q  3 , 5MTP 

*Show  ( (-P+Q)+(P+Q) ) 

(-P+Q)  Assume 

♦Show  (P+Q) 

-’(P+Q)  Assume 

♦Show  P 

-P  Assume 


Q  8,1 2MP 

(P+Q)  13, Add 

-(P+Q)  10, R 

(P+Q)  1 1 , Add 

;  (  P+Q )  ++  (  -P+Q )  )  2 , 7CB 


Further  examples  of  more  interesting  problems,  together  with 


comparisons  to  other  theorem  provers,  will  be  given  in  the 


next  chapter  and  Appendix  I. 


VII.  SOME  COMPARISONS  AND  DISCUSSION 


A.  The  Logic  Theorist  and  the  British  Museum  Algorithm 

The  Logical  Theorist  (LT) ,  in  its  original  formulation 
(Newell  et  al  1957)  managed  to  prove  38  out  of  the  first  52 
theorems  of  Whitehead  &  Russell  (1910:  Chap.  2); 37  the  "new, 
improved"  LT  of  Stefferud  (1963,  also  reported  in  Newell  & 
Shaw  1972)  managed  to  prove  19  of  the  67  theorems  in  Chapter 
2  of  Whitehead  &  Russell,  plus  three  from  Chapter  3.  On  the 
whole  this  was  a  pretty  meagre  achievement,  since  the 
theorems  are  extremely  simple.  A  selection  of  some  of  the 
Whitehead  &  Russell  theorems  are  proved  in  Appendix  I .  As 
mentioned  in  Chapter  III,  the  most  difficult  one  proved  by 
the  original  LT  was  (  (p->q)  +  (_,q-*-,p)  )  ,  which  was  apparently 
not  proved  by  the  new  LT.  The  most  difficult  of  the  ones 
proved  by  the  new  LT  is  (-,_,p^p). 

THINKER  proved  all  these  theorems  without  difficulty; 

the  most  difficult  was  (  ( p+q )  ( p+r  )  )-►  ( p+  (q->r  )  )  ,  which  is 

presented  in  Appendix  I  together  with  a  sampling  of  ones  of 

interest  from  the  two  versions  of  the  LT.  "Difficulty"  here 

in  this  context  means  "how  much  CPU  time  was  used"  and  "how 

many  statements  were  executed".  Such  comparisons  are 

somewhat  unfair  to  THINKER  because  this  is  the  full  version 

of  THINKER  where  numerous  of  the  statements  executed  and 

much  of  the  time  involved  has  to  do  with  various  checks 

relevant  to  quantified  statements.  Since  neither  of  the  LTs 

37  The  data  on  LT  derives  from  the  report  made  by  Siklossy, 
et  al  1973,  as  does  the  information  about  the  "new  LT". 
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were  concerned  to  prove  quantified  statements,  they  were  not 
encumbered  by  such  considerations.  It  is  estimated  (by 
comparison  with  an  earlier,  less  efficient  version  of 
THINKER  which  only  did  propositional  arguments)  that  the 
mere  addition  of  code  sufficient  for  quantifier  checking 
makes  execution  slow 'down  by  a  factor  of  2  or  3,  even  when 
there  are  no  quantifiers  in  the  formulae  to  be  considered. 
Nevertheless  THINKER  proved  these  theorems  much  faster  than 
either  of  the  LTs  did.  Time  comparisons  here  are  difficult 
because  of  the  different  machines  and  languages  used.  Both 
versions  of  LT  were  programmed  in  IPL-V  and  apparently  run 
on  a  JONNIAC.  THINKER  was  run  on  an  AMDAHL  470  V/8 . 

THINKER’S  ability  to  prove  these  theorems  is  not  such  a 
great  feat.  Wang  (1963)  gives  a  program  which  does  the  same 
thing,  and  this  program  is  so  simple  that  it  is  used  as  an 
example  in  Griswold  et  al  (  1  968:  p . 1 8 3 f f )  .  Furthermore  even 
a  br eadth- f i r st  "exhaustive  enumeration"  of  proofs  can  do 
better  than  LT.  Siklossy  et  al  1973  report  on  such  a  program 
and  compare  its  results  to  those  of  the  two  versions  of  LT . 
Everything  either  of  the  LTs  could  prove  was  proved  by  their 
exhaustive  search  method,  plus  some  others.  Details  can  be 
found  in  Appendix  I.  The  most  difficult  theorem  which  was 
proved  by  this  program  was  (p^q)-^(q_>p)  ,  f°r  which  see 
THINKER’S  proof  in  Appendix  I.  The  authors  remark  that  in 
their  opinion,  the  hardest  theorem  to  prove  of  the  first  52 
is  ( -ip-^-q ) ->  ( -,q-^p )  ,  while  the  hardest  of  the  6/  in  Chapter  2 
of  Whitehead  &  Russell  is  (  (p+q)_>(p+r )  )-*(p+  (q-*r )  )  .  As  noted 
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above,  THINKER  concurs  in  this  assessment. 

B.  Bledsoe's  "Natural"  Systems 

It  is  difficult  to  know  the  precise  limits  of  Bledsoe's 
systems,  since  no  list  is  given  of  what  of  arguments  or 
theorems  can  and  cannot  be  proved.  However,  as  mentioned  in 
Chapter  IV,  Bledsoe  et  al  (1972:  p.59)  remark  that  their 
system,  IMPLY,  is  not  complete. 

IMPLY  is  incomplete  in  many  ways.  For  example,  ...it 
can  prove  the  skolemized  formula 
( P0&  ( Px-*Pf  ( x  )  )+Pf  (f  (x)  )  ) 
but  it  cannot  handle  the  following  equivalent 
formula 

(-,PO+(Px+Pf(f(x))&(",PO+(_,Pf(x)+Pf(f(x)))) 

because  the  substitution  [0/x]  satisfying  the  first 

conclusion  does  not  satisfy  the  second. 

Again,  it  is  difficult  to  know  just  what  they  mean  here.  The 

induction  axiom  is  just  that:  an  axiom.  Ergo,  it  seems 

implausible  to  suppose  that  their  system  proved  it.38  What  I 

suspect  they  mean  is  that  IMPLY  cannot  prove  the  two 

formulations  to  be  equivalent.  Yet  their  equivalence  is 

merely  a  matter  of  the  propositional  logic.  In  Appendix  I 

two  proofs  are  given,  one  with  the  equivalence  stated  in  the 

38  Difficult  but  not  impossible.  For,  they  may  have  other 
methods  in  the  (unstated!)  background  which  yield  each 
instance  of  the  induction  scheme.  See,  for  example,  Goguen 
(1980)  for  details.  Even  so,  the  fact  remains  that  IMPLY 
could  not  prove  a  t ruth-f unc t ionally  equivalent  version  of 
it . 
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propositional  logic  as 

(  (  (p&  (q+r  )  )+s)«-»>(  (^p+(q+s)  )  &  ( ~’p+  (  ^r  +  s  )  )  )  ) 
thereby  showing  that  the  formulations  are  equivalent  in  that 
sense,  and  the  other  proof  stated  in  the  quantifier  logic  as 
(  (Ax)  (  (Pa& (Px+(Ey )  (Py&Rxy)  )  )-*(Ez)  (Ew)  (Pz&Rxw&Rwz)  ) 

(Ax)  (  (  _,Pa+  (Px  +  (Ez  )  (Ew)  ( Pz&Rxw&Rwz  )  )  )  & 

( -,Pa+ ( -1  (Ey )  (Py&Rxy)  +  (Ez)  (Ew)  (Pz&Rxw&Rwz)  )  )  ) 

To  get  this  latter  version  I  restored  the  implicit  universal 
quantifiers  to  the  front  of  their  formulae,  and  treated  the 
’’successor  function"  applied  to  (universally  quantified)  x 
merely  as  an  existentially  quantified  variable  in  the  scope 
of  that  universal  quantifier.  (This  was  done  since  THINKER 
does  not  have  arbitrary  functions).  Of  course,  then  ”f(x)" 
no  longer  means  the  intended  "the  number  after  x"  but  only 
"something  dependent  on  x" .  And  ”f(f(x))"  will  not  be  given 
the  intended  interpretation  of  "the  number  after  the  number 
after  x" ,  but  also  as  "something  related  to  something 
dependent  on  x" .  Although  these  versions  of  the  two 
formulations  of  the  induction  axiom  do  not  quite  say  what 
the  original  formulations  said,  the  difference  between  what 
I  give  and  the  original  is  the  same  for  the  two  formulations 
so  that  the  resulting  formulae  are  still  equivalent.  The 
second  proof  in  Appendix  I  shows  THINKER  proving  this.  And 
just  in  case  this  formulation  does  not  ring  true,  a  version 
of  these  formulations  is  proved  wherein  the  functions  are 
replaced  with  constants. 
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As  mentioned  in  Chapter  IV,  it  seems  likely  that  IMPLY 
is  severely  limited  in  what  it  can  do.  Indeed,  I  suspect  it 
cannot  prove  many  of  the  quantif icational  theorems  given  in 
Appendix  I,  and  I  am  certain  that  it  has  not  solved  what  I 
later  in  this  Chapter  call  the  UI/EI  problem.  And  all  this 
is  assuming  that  it  has  a  "built  in"  propositional  checker 
to  fall  back  on.  (See  the  discussion  in  Chapter  IV). 

C.  The  Full  Propositional  Logic 

THINKER  has  proved  every  propositional  theorem  (and 
argument)  it  has  been  asked  to,  and  it  has  been  asked  every 
one  from  Kalish  &  Montague  (1964:  Chapt .  2),  Whitehead  & 
Russell  (1910:  Chapt.  2,  3  and  4),  and  Thomason  (1972). 

There  are  115  such  theorems  in  Kalish  &  Montague,  plus  45 
exercises;  there  are  10  Whitehead  &  Russell  theorems  not 
included  in  the  Kalish  &  Montague  ones;  and  there  are  5 
Thomason  theorems  not  so  included,  plus  30  exercises. 
Appendix  I  contains  proofs  of  the  more  interesting  ones, 
together  with  discussion. 

Conversion  to  clause  form  requires  the  truth  of  certain 
propositional  equivalences  such  as 
( -•  (p-K3)«-+  (p&’-’q)  ) 

(  ( p+q )  **--►  (  -,P+q)  ) 

(  ■'  ( p«--*q )  <--*•  (  (p&^q )  +  ( “,p&q )  )  ) 

(  ( p«--*q  )*--►((  p&q )  +  ( -,p&  “’q )  )  ) 

( -1  (p&q)^(  -ip+_,q)  ) 

(-,(p+q)^('ip&",q)  ) 


. 


138 


( (p+(q&r)  )«--►(  (p+q)&(p+r )  )  ) 

These  cannot  be  proved  within  a  resolution  based  system, 
since  they  are  presupposed  by  such  systems  in  converting  to 
clause  form.  That  they  are  not  trivial  to  prove  is  shown  by 
their  proofs  in  Appendix  I.  The  longest  of  the  propositional 
theorems  to  prove  is  the  associativity  of  Kalish  & 

Montague's  Theorem  95. 

(  ( [p«-+q]«--*r  )<--*(p«--»[q«--*r  ] ) ) 

D.  The  Predicate  Logic 

As  already  illustrated  by  the  proofs  in  Appendix  I  of 
the  quantifier  theorems  mentioned  in  discussing  Bledsoe, 
THINKER  is  quite  good  at  proving  a  wide  class  of  such 
problems.  Appendix  I  also  contains  a  selection  of  problems 
gathered  from  Kalish  &  Montague  Chapts.  3  and  4,  and 
Thomason  Chapts.  9-11.  THINKER'S  proofs  of  some  of  these  are 
perhaps  noteworthy. 

The  following  is  a  problem  from  Kalish  &  Montague  which 
I  regularly  give  my  elementary  logic  students  as  an  extra 
credit,  "hard"  problem  at  the  middle  of  a  year  long  course: 

(Ex)(Fx+P),  (Ex)(P+Fx)  |-  (Ex)(P^->Fx) 

The  simplicity  of  this  problem  is  deceptive.  The  premises 
each  say  that  some  object  has  the  property  indicated  by  the 
open  formula  following  the  quantifiers.  But  there  is  no 
guarantee  that  it  is  the  Sdiwe  object,  so  one  cannot  infer 
the  conclusion  via  the  condi t i onals- to-bicondi t i onal 
inference  rule.  However,  since  the  subformula  P  does  not 
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have  x  free  in  it,  we  are  in  this  case  allowed  to  make  the 
inference.  To  see  this,  I  usually  tell  my  students  to  use 
this  informal  argument.  If  we  assume  the  negation  of  the 
conclusion,  we  will  have  (by  QN) 

1  .  ( Ax  )  ( P«--*Fx  ) 

Now,  do  a  separation  of  cases:  suppose  first  that  P  is 
false,  then  suppose  that  P  is  true.  Either  case  is 
contradictory,  so  the  assumption  is  false,  hence  the 
conclusion  is  true.  I f  we  suppose  P  false,  do  an  El  on  the 
first  premise  (to  z9)  and  a  MT  with  ”iP.  This  yields  -,Fz9.  -,P 
and  -iFz9  together  entail  (P«--»Fz9).  But  a  UI  on  our 
assumption  (1)  yields  -i(P^Fz9),  contradictorily.  If  we 
suppose  P  true,  do  an  El  (to  z8)  on  the  second  premise  and 
an  MP,  yielding  Fz8.  P  and  Fz8  together  entail  (P*--»Fz8), 
which  is  contradicted  by  Uling  the  assumption. 

THINKER  proved  this  problem  differently.  After 
introducing  the  assumption  (1)  above,  and  the  premises,  and 
Eling  the  premises  to  new  variables,  it  UI s  (1)  to  each  of 
the  new  variables  giving  us 


2. 

-•(P<- 

-*Fz  9  ) 

3. 

j 

t 

->Fz  8  ) 

It  then  does  a  SEARCHNEGS  on  (2),  setting  the  goal 

4.  show  (P^->Fz9) 

Since  we  already  have  P^Fz  9  from  the  El  of  the  premise,  we 
need  only 

5.  show  Fz,->P 


6.  Fz  9  ASSUME 
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7 .  show  P 

8.  -P 

THINKER  now  does  another  SEARCHNEGS,  this  time  on  (3) 
setting  the  goal 

9.  show  P«~*Fz8 

Again  we  already  have  Fz8-^P  from  the  premise,  so  we  only 

10.  show  P^Fz8 

1 1 .  P  ASSUME 

THINKER  now  repeats  8,  cancelling  10.  Putting  this  together 
by  CB  with  Fze^p  we  can  cancel  9.  THINKER  now  repeats  (3), 
cancelling  by  contradiction  line  7  and  hence  5.  With  5  and 
our  P-+Fz,  we  CB  and  cancel  4,  which  contradicts  2,  thus 
proving  the  theorem. 

THINKER  can  also  prove  some  rather  tedious  problems, 
such  as  the  following  from  Kalish  &  Montague. 

j-  [  (  [  (Ex)Fx«-»(Ex)Gx]&(Ax)  (Ay)  (  (Fx&Gy  )->  (Hx^jy )  )  )  -> 

[  ( Ax  )  (Fx-*Hx  )«--»■  ( Ax  )  (Gx-»Jx  )  ]  ] 

( Ax  )  ( Fx->  ( Ax  )  Gx  )  ,  (  ( Ax  )  ( Gx+Hx  )  ->  ( Ex  )  ( Gx&I  x  )  )  , 

(  (  Ex  )  I  x^  ( Ax  )  ( Jx-^Kx  )  )  f-  ( Ax  )  (  (Fx&Jx  )->-Kx  ) 

[ (Ax) (Fx-Gx)  +  (Ex) (Fx&Hx) ] ,  (Ex)Fx, 

( Ax  )  ( I  x-*  ( -1  Jx+^Hx  )  )  ,  (Ax)  (Fx->(  Ix&Jx)  )  |-  (Ex)(Gx&Fx) 

( Ax  )  ( Fx->  (Gx+Hx  )  )  ,  (Ax)  (  (Gx+Hx  )-*I  x  )  , 

■’(Ex)  (I  x&Gx  )  ,  (-(Ex)Fx^(Ex)Gx)  |-  (Ex(Fx&Hx) 

(Ex)  ( Fx&  -1Gx  )  ,  (Ax)  ( Fx-*Hx  )  ,  (Ax)  (  ( Jx&Ix)->Fx) 

[  (Ex)  (Hxfc-’GxMAx)  (Ix-^Hx)  ]  |-  ( Ax  )  ( Jx-^1  x  ) 

The  proofs  of  these  theorems  are  all  in  Appendix  I. 
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E.  The  UI/EI  Problem,  I 

There  are  some  theorems  THINKER  has  great  difficulty  in 
proving.  Consider  an  attempt  to  prove  ( Ey ) ( Ax ) ( Py+Px ) . 
THINKER  will  assume  its  negation,  and  apply  quantifier 
negation  to  it  to  yield 
1.  ( Ay  )  ~>  ( Ax  )  ( Py-^Px  ) 

which  is  the  only  line  THINKER  will  consider  available  from 
now  on,  until  it  generates  new  antecedent  lines.  Having  no 
constants  or  free  variables  in  the  proof,  THINKER  "picks 
one"  (say,  z,)  and  does  a  universal  instantiation  (Ul) 
yielding  the  new  antecedent  line 
2  .  “•  ( Ax  )  ( Pz  9  -*Px  ) 
which  allows  another  QN  to 
3.  (Ex  )  (Px  9-*Px  ) 

THINKER  will  now  perform  an  existential  instantiation  (El) 
to  a  new  variable,  z8,  yielding 

4  .  ( Pz  9->Pz  8  ) 

However,  there  is  now  a  variable  in  the  proof,  viz.,  z8,  to 
which  line  1  ought  to  be  instantiated  (so  THINKER  believes). 
Hence  another  antecedent  line  is  generated 

5  .  -1  ( Ax  )  ( Pz8+Px  ) 

In  turn,  line  5  is  QNed  and  then  Eled  (to  a  new  variable  z7) 
yielding 

6  .  ->  (  Pz  8-*Pz  7  ) 

z  7  is  a  new  variable,  hence  1  will  be  Uled  to  it.  As  can  be 
seen,  this  leads  to  a  fruitless  series  of  UIs  alternating 
with  Els.  In  this  problem,  THINKER  ought  to  not  continue 
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this,  but  rather  (once  it  has  4)  set  up  a  new  goal 
7  .  Show  ( Pz  ,-^Pz  8  ) 

using  the'  SEARCHNEGS  strategy.  In  fact,  THINKER  has  at  its 
disposal  only  40  variables,  so  eventually  it  will  set  itself 
the  goal  7.  But  this  happens  only  after  39  EI/UI 
combinations  (preceeded  by  the  initial  UI )  which  will  make 
the  40th  El  fail.  Only  then  does  THINKER  move  on  to  the 
SEARCHNEGS  strategy.  This  is  the  phenomenon  I  dub  "the  UI /El 
problem" . 

It  is  worth  noting  that  this  problem  also  arises  in 
resolution  provers.  Suppose  such  a  prover  has  a  "unit 
preference"  strategy:  whenever  a  unit  resolution  is 
available,  do  it;  if  none  are  available  perform  some  other 
resolution.  And  suppose  it  has  these  two  clauses  available 
-’Px  +  Pf(x) 

Pf  (x) 

Such  a  prover  will  note  that  the  substitution  of  f(x)  for  x 
in  the  first  clause  allows  it  to  perform  a  unit  resolution 
against  the  second  clause  yielding 
P  f  (  f  ( x  )  ) 

But  this  allows  another  unit  resolution  against  the  first 
clause  by  substituting  f ( f ( x )  )  for  x,  and  yields 
Pf ( f ( f ( x ) ) ) 

As  one  can  see,  this  too  is  an  unending  sequence  of  unit 
resolutions.  Indeed,  this  is  essentially  the  same  problem 
used  earlier  to  illustrate  the  difficulty  in  THINKER,  for 
the  first  clause  is  merely  the  clausal  form  of 
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( Ax ) (Ey ) (Px+Py ) 

and  the  second  clause  represents  being  able  to  get  Pz8  in 
THINKER.  (The  f(x)  of  the  second  clause  means  that  its 
variable  was  existentially  quantified  and  was  originally  in 
the  scope  of  a  universal  quantifier,  as  the  transition  from 
lines  3  to  4  of  the  THINKER  proof  above  has  it.) 

Two  methods  of  avoiding  the  problem  in  THINKER  suggest 
themselves.  The  first  would  be  to  have  THINKER  apply  the 
SEARCHNEGS  strategy  as  soon  as  it  became  available,  rather 
than  continuing  with  the  UI s .  The  second  would  be  to  keep 
track  of  variables  introduced  in  this  undesirable  manner. 
That  is,  whenever  a  new  variable  is  introduced  by  an  El  and 
that  line  has  come  from  a  UI ,  do  not  allow  that  universally 
quantified  line  to  perform  another  UI  to  this  new  variable. 

The  first  method  was  judged  inadequate  on  the  grounds 
that  there  is  no  guarantee  that  SEARCHNEGS  is  in  general  the 
appropriate  strategy.  Indeed,  perhaps  other  sentential  rules 
such  as  MP,  DN,  etc.,  should  be  performed  before  SEARCHNEGS. 
In  any  case,  there  is  no  guarantee  that  SEARCHNEGS  will 
succeed  in  finding  a  formula  which  is  the  negation  of  an 
of  a  ■«—»•,  or  of  a  +  ,  so  it  would  be  unwise  to  stop  the 
TRYRULES  early. 

For  these  reasons,  and  in  consultation  with  Dan  Wilson, 
I  opted  to  implement  the  other  strategy.  It  is  obvious, 
however,  that  the  example  cited  at  the  beginning  of  this 
section  is  only  one  of  many  possible  ways  the  UI/EI  problem 
might  arise.  Consider  a  proof  which  has  these  two  lines  as 
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antecedent 

( Ax )Fx 

(Ax  )  ( Fx->  ( Ey  ) Gy ) 

The  first  might  get  instantiated  to  z9  which  leads  to  the 
second  being  Uled  to  z,  and  a  MP  performed,  leaving  the 
proof  with 

(Ey)Gy 

as  an  antecedent  line.  An  El  is  now  performed  on  this  line 
to  the  new  variable  z8.  But  this  now  allows  the  first  two 
lines  to  be  Uled  to  z6  and  the  unending  (until  all  40 
variables  are  used)  series  of  UI/EI  continued. 

Generally,  what  THINKER  wants  is  to  keep  track  of 
universal  formulae  no  matter  how  far  back  in  a  chain  of 
justification  that  eventually  leads  to  an  El.  Having  kept 
track  of  these,  we  wish  to  prohibit  such  universal  formulae 
from  being  Uled  to  the  new  variable  obtained  by  the  El.  Two 
constructs  are  necessary  for  this:  the  ancestor  list  of  a 
formula  (that  formula's  A-list),  and  the  prohibited  list  of 
variables  for  a  formula  (that  formula's  P-list).  These  are 
the  two  further  fields  of  the  data  types  ANTE  and  GOAL 
mentioned  in  Chapter  V,  and  promised  there  to  be  explained 
in  this  Chapter.  They  are  in  fact  lists;  the  first  is  a  list 
of  pointers  to  lines  in  the  proof  matrix  (the  PRMAT)  and  the 
latter  is  a  list  of  variables.  The  implementation  of  the 
A-list  is  effected  by  (optional)  extra  arguments  to  the 
function  ADDPROOF:  whenever  a  formula  is  added  to  the  proof, 
pointers  to  its  ancestor(s)  are  also  computed.  Of  course, 
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all  one  needs  keep  track  of  are  the  universally  quantified 
ancestors.  So  the  relevant  method  is  a  matter  of  copying  the 
pointers  (to  universally  quantified  lines)  of  a  line's 
immediate  parents.  If  line  X  comes  from  line  Y  (in  ways  soon 
to  be  specified)  and  line  Y  is  universally  quantified,  then 
line  Y  is  on  line  X's  A-list.  In  addition,  if  line  X  comes 
from  line  Y  (in  the  relevant  ways)  then  every  line  on  Y's 
A-list  is  also  on  X's  A-list.  In  effect  then,  the  A-list  of 
a  line  in  the  proof  is  a  list  of  all  the  universally 
quantified  formulae  which  that  line  depended  upon  (in  the 
sense  of  there  being  an  inference  chain  which  invoked  the 
universally  quantified  line). 

Now,  whenever  an  El  is  performed  on  a  line,  every 
member  of  that  line's  A-list  has  the  new  variable  added  to 
its  P-list.  And  a  restriction  on  the  function  FINDUI 
prevents  a  UI  being  performed  on  a  universally  quantified 
line  to  any  variable  on  that  line's  P-list. 

One  computes  A-lists  as  follows: 

1.  Premises  and  the  initial  "show"  have  a  null  A-list. 

2.  If  a  line  X  comes  by  UI  from  line  Y,  then  line  Y  and  all 
of  Y's  A-list  are  on  line  X's  A-list. 

3.  If  a  line  X  comes  by  a  rule  of  inference  from  line  Y 
(and  possibly  also  line  Z  in  the  case  of  a  two-premise 
rule  of  inference),  then  line  X  has  all  of  Y's  (and  Z's) 
A-list  as  its  A-list. 

4.  An  assumption  has  the  A-list  of  the  "show"  line  it  was 


assumed  from. 
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5.  If  a  "show"  line  is  generated  by  a  SPLITTING  heuristic, 

then  it  has  the  A-list  of  the  "show"  line  it  came  from. 

6.  If  a  "show"  line  is  generated  by  FINDNEGS  or  CHAINING, 

then  it  has  the  A-list  of  the  line  in  the  proof  that 

gave  rise  to  this  "show".39 

Now,  if  a  line  X  is  Eled  to  a  new  variable,  z,  say,  then  z, 
is  placed  on  the  P-list  of  all  of  X’s  A-list.  And  UIs  are 
not  allowed  on  a  (universally  quantified)  line  to  any  member 
of  its  P-list. 

F.  The  UI/EI  Problem,  II 

The  above  emendation  allows  one  to  prove  the  theorem 
mentioned  in  the  last  section,  as  the  proof  in  Appendix  I 
illustrates.  However,  consider  this  problem: 

( Ex  )  ( Ay )  ( Az  )  (  ( Py->Qz  )  ->  ( Px^Qx  )  ) 

THINKER  will  assume  its  negation  and  do  a  QN  leaving  it  the 
line 

1  .  ( Ax  )  ( Ay  )  ( Az  )  (  (  Py->Qz  )-*■{  Px-»Qx  )  ) 

THINKER  now  does  a  UI  (to  z,,  say),  a  QN  and  an  El  (to  the 
new  variable  z  8 ) 

2.  -1  ( Az  )  (  (Pz  8->-Qz  )-►(  Pz  ,^Qz  9  )  ) 

z  8  is  on  1's  P-list  (since  1  is  on  2's  A-list)  and  so  line  1 
cannot  be  Uled  to  this.  THINKER  therefore  does  a  QN  on  2  and 
another  El  (to  z7)  leaving 

3.  -1  (  (  Pz  8^Qz  7  )-*•  ( Pz  ,-^Qz  ,  )  ) 

3  9 1  .  e  .  ,  if  " -i  ( p^q )  "  was  an  antecedent  line  and  FINDNEGS  used 
it  to  generate  the  line  "show  (p-+q)",  then  the  latter  has 
the  A-list  of  the  former.  Similarly  for  the  case  where 
"(p->q)"  was  used  by  CHAINING  to  generate  the  line  "show  p" . 
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z7  is  also  on  Ts  P-list  (since  1  was  on  3's  A-list)  and  so 
line  1  cannot  be  Uled  to  this.  THINKER  therefore  does  a 
FINDNEGS  and  generates  these  "show"  lines/assumption  lines 
(the  first  is  from  FINDNEGS  and  the  rest  are  the  SPLITTING 
heuristics ) . 

4.  show  (  (Pz  8^Qz  7  )-*(Pz  ,->Qz  , )  ) 

5.  Pz8-K)z,  ASSUME 

6.  show  Pz,+Qz9 

7.  Pz  9  ASSUME 

8.  show  Qz 9 

9.  "’Qz ,  ASSUME 

And  now  THINKER  is  stuck.  No  rules  of  inference  can  be 
applied  to  any  of  these  lines.  The  proof,  however,  could  be 
completed  in  the  following  way. 

10.  "•  ( Ay  )  ( Az  )  (  ( Py^Qz  )-^(  Pz  £->Qz  8  )  )  1,Ul(to  z8) 

11.  -1  ( Az  )  (  (Pz  6+Qz  )^(Pz  s^Qz  8  )  )  1  0  ,  QN ,  El  (to  z6) 

12.  -1  (  (  Pz  6-*Qz  5  )->  (  Pz  s^Qz  s  )  )  1  1  ,  QN  ,  El  (to  z5) 

13.  -■  ( Ay )  ( Az  )  (  ( Py^Qz  )-►  ( Pz  ,-^Qz  ,  )  )  1,UI(to  z7) 

14.  "■  ( Az  )  (  (  Pz  4-^Qz  )  ( Pz  ,-^Qz  7  )  )  13,QN,EI(to  z6) 

15.  -(  (Pz4^Qz3  )-*(Pz7+Qz7  )  )  1  4  ,  QN  ,  El  (to  z5) 

16.  show  (  ( Pz  6^Qz  5  )  ->■  ( Pz  s^Qz  s  )  )  (via  FINDNEGS  on  line  12) 

17.  Pz  6->Qz  s  ASSUME 

18.  show  Pz8^Qz8 

19.  Pz  e  ASSUME 

20.  show  Qzs 

21.  "’Qz  s  ASSUME 

22.  Qz,  5, 1 9MP 
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23.  show  (  (Pz4+Qz3  )  +  (Pz7-*Qz7  )  )  (via  FINDNEGS  on  line  15) 

24.  Pz  4-*Qz  3  ASSUME 

25.  show  Pz7^Qz7 

26.  Qz  7  22 ,  R 

Line  26  cancels  line  25,  which  cancels  23.  If  line  15  is 
repeated  now,  it  contradicts  line  23  and  allows  cancellation 
of  line  20,  which  cancels  line  18  and  in  turn  16.  16 
contradicts  12  and  hence  cancels  8,  cancelling  6  and  then  4. 
In  turn,  this  completes  the  proof  by  contradicting  3. 

The  solution  to  this  problem  requires  being  able  to  do 
controlled  UIs  to  variables  on  a  formula's  P-list.  They  must 
be  controlled  or  else  one  into  runs  the  problem  indicated  in 
the  last  section.  But  they  must  be  allowed.  In  the  above 
example,  line  1  was  Uled  to  the  variables  z8  and  z7;  which 
were  on  its  P-list.  The  solution  I  adopted  was:  retain  the 
restriction  given  in  the  last  section  with  regard  to  the 
function  FINDUI ,  which  was  used  in  the  "blind"  procedure 
TRYRULES ,  but  after  all  the  strategies  have  been  tried,  see 
whether  the  new  function  FINDUIPROHIB  (which  does  a  UI  to 
all  variables  on  a  formula's  P-list)  will  add  any  new  lines 
to  the  proof.  If  not,  continue  on  to  the  HELP  routine;  but 
if  it  does  add  new  lines,  branch  back  and  redo  the  various 
strategies.  Of  course,  any  new  Els  will  add  new  variables  to 
the  P-lists,  but  the  original  restriction  is  still  in 
effect:  FINDUI  will  not  allow  UI  to  such  variables.  In  the 
above  example,  after  we  allowed  a  FINDUIPROHIB  on  line  1  to 
the  variables  z8  and  z7  yielding  lines  10  and  13,  THINKER 
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found  itself  able  to  do  QNs  and  Els  on  these  lines.  The  Els 
generated  new  variables  z6,  z5,  z4,  and  z3/  which  were 
placed  then  on  line  I’s  P-list  (since  lines  10  and  13  both 
had  line  1  on  their  A-list).  Line  1  cannot  be  Uled  to  these 
variables  until  all  the  strategies  have  applied,  and  in  the 
present  case  the  strategies  yield  a  proof.  Had  they  not, 
THINKER  would  eventually  have  reapplied  FINDUIPROHIB  and 
discovered  that  new  lines  could  be  generated.  And  this  would 
then  restart  all  the  heuristics. 

THINKER'S  proof  of  this  theorem  can  be  found  in 
Appendix  I . 

G.  The  UI/EI  Problem,  III 

Consider  this  theorem 
( Ax  )  ( Ay  )  (Ez  )  ( Aw  )  (  ( Fx&Gy  )•*  ( Hz&I  w  )  ) 

(  ( Ex  )  ( Ey )  ( Fx&Gy ) ->  (  ( Ez  )  Hz&  ( Aw )  I  w )  ) 

THINKER  assumes 

1  .  (Ax)  (Ay  )  (Ez)  (Aw)  (  ( Fx&Gy )  ( Hz&I  w )  ) 

and  generates  the  goal-assumption-goal  sequence 

2.  show  ( Ex  )  ( Ey )  ( Fx&Gy )•*■{  (Ez)Hz&(Aw)Iw) 

3.  (Ex) (Ey) (Fx&Gy )  ASSUME 

4.  show  ( Ez )Hz& ( Aw ) I w 

A  splitting  heuristic  now  generates  the  goal-assumption 
sequence 

5.  show  (Ez)Hz 
6  .  “*  ( Ez  )  Hz 

Line  3  is  Eled  and  its  conclusion  is  Eled  again  (the  two  Els 
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to  distinct  new  variables) 

7.  Fz9&Gz8 

Line  6  is  QNed  and  line  7  is  Sed  (Simplified) 

8.  (Az)-’Hz 

9 .  Fz  9 

10.  Gz  8 

Line  1  is  Uled  to  these  two  variables,  yielding 

11.  ( Ay ) (Ez ) ( Aw ) ( (Fz ,  &Gy  )-* (Hz&I  w )  ) 

12.  ( Ay  )  ( Ez  )  ( Aw  )  (  ( Fz  8  &Gy  )  -»•  ( Hz&I  w  )  ) 

The  variables  z9  and  z8  are  on  1’s  P-list,  so  1  will  not  be 
Uled  to  them  (until  after  all  heuristics  have  had  their 
chance  to  apply  and  FINDUIPROHIB  is  called).  Lines  11  and 
12,  however,  can  be  Uled  to  these  variables,  yielding 

13.  (Ez)  (Aw)  (  ( Fz  9  &Gz  ,  )-*  ( Hz&I  w  )  )  1  1  ,UI 

14.  (Ez)  (Aw)  (  ( Fz  9  &Gz  8  )-*(Hz&Iw)  )  1  1  ,UI 

15.  (Ez)  (Aw)  (  ( Fz  8  &Gz  9  )-»•  (Hz&I  w )  )  12,UI 

16.  (Ez)  (Aw)  (  ( Fz  8  &Gz  8  MHz&Iw)  )  12,UI 

The  lines  13-16  now  allow  Els  (to  new  variables) 

17.  (Aw)  (  (Fz9&Gz9  )  ->  ( Hz  7  &  I  w  )  )  13, El 

18.  ( Aw  )  (  (Fz9&Gz8  )  ->•  ( Hz  6  &  I  w  )  )  14, El 

19.  (Aw) ( (FzE&Gz9 )+(Hz5&Iw) )  15, El 

20.  ( Aw ) ( (FZj&GZg )  -►  ( Hz  4  &  I  w  ) )  16, El 

Lines  17  and  18  have  lines  11  and  1  on  their  A-list,  thus 

lines  11  and  1  have  z7  and  z6  on  their  P-list  (and  line  1 

still  has  z9  and  z8  also  on  its  P-list).  Lines  19  and  20 
have  lines  12  and  1  on  their  A-list,  thus  lines  12  and  1 
have  z5  and  z4  on  their  P-list.  Note,  however,  that  z5  and 
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z4  are  not  on  line  11’s  P-list,  nor  are  z7  and  zt  on  line 
12's  P-1 i st .  Therefore  line  11  can  be  Uled  to  z5  and  z4, 
while  line  12  can  be  Uled  to  z7  and  a*. 

21.  (Ez) (Aw) ( (Fz,&Gzs )+(Hz&Iw) )  11, UI 

22.  (Ez ) (Aw) ( (Fz ,&Gz 4 )-»(Hz&Iw) )  11, UI 

23.  (Ez)  (Aw)  (  (Fz8&Gz7  )-*(Hz&Iw)  )  12, UI 

24.  (Ez)  (Aw)  (  ( Fz  8  &Gz  6  ) -»•  (Hz&I  w  )  )  12, UI 
This  yields  four  new  Els  to  new  variables 

25.  ( Aw  )  (  ( Fz  9  &Gz  5  )-*(Hz3&Iw)  )  21, El 

26.  (Aw)  (  (Fz9&Gz4)->-(Hz2&Iw)  )  22, El 

27.  (Aw) ( (Fz8&Gz7 )+(Hz , &Iw) )  23, El 

28.  (Aw  )  (  (FZg&GZs  )->(Hz0&Iw)  )  21, El 

Since  lines  25  and  26  have  lines  11  and  1  on  their  A-list, 
z3  and  z2  are  on  11's  and  1’s  P-list.  And  since  lines  27  and 

28  have  lines  12  and  1  on  their  A-list,  z,  and  z0  are  on 

12’s  and  1fs  P-list.  Note,  however,  that  z,  and  z0  are  not 
on  line  11’s  P-list,  nor  are  z3  and  z2  on  line  12's  P-list. 
This  gives  rise  to  more  UI s  on  these  lines;  and  by  El,  to 
more  new  variables  in  the  proof  and  hence  to  more  UI s . 

The  problem  here  is  that  lines  11  and  12  should  not 
have  been  Uled  to  z7,  z6  and  z5,  z4  respectively.  (I.e.,  the 
lines  21-24  should  not  have  been  allowed.)  Rather,  the  proof 
should  have  proceeded 

21.  ( Fz  9  &Gz  8 )->(Hz6&Iz9 )  18, UI  (or  any  variable  for  w) 

22.  Hz  6  &I z ,  7,21 MP 

23.  Hz  6  22  , S 

24.  (Ex)Hx  23, EG 
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which  will  cancel  goals  up  through  line  5,  and  then  the  goal 
25.  show  (Aw)Iw 

would  be  added  and  proved  similarly. 

So  it  appears  that  when  a  universally  quantified  line 
comes  by  UI  from  another  line,  then  not  only  is  the  latter 
on  the  former’s  A-list,  but  also  the  former  should  be  on  the 
latter's  A-list.  Here,  lines  11  and  12  are  universally 
quantified  lines  which  come  by  UI  from  line  1,  so  not  only 
is  line  1  on  lines  11  and  12's  A-list,  but  also  lines  11  and 
12  should  be  on  1's  A-list.  That  is,  the  universally 
quantified  children  of  a  universally  quantified  parent 
should  be  ancestors  of  that  parent  (in  addition  to  the 
reverse ) . 

The  mechanism  for  constructing  A-lists  was  suitably 
altered  and  a  proof  in  Appendix  I  shows  that  this  problem 
can  be  solved  using  it. 

H.  The  UI /E I  Problem,  IV 

I  had  thought  that  with  the  method  indicated  in  the 
last  section  the  UI/EI  problem  had  been  solved  --  or  at 
least  solved  to  the  extent  it  can  be  solved  given  the 
undecidability  of  first  order  logic.  (The  undecidable 
classes  of  formulae  in  first  order  logic  are  those  whose 
prenex  normal  form  has  an  existential  quantifier  in  the 
scope  of  universal  ones.  And  wasn't  the  repair  indicated  in 
the  last  section  sufficient  to  handle  existential 
quantifiers  buried  arbitrarily  deeply  inside  universal 
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quantifiers?  The  problem  brought  out  and  solved  in  UI/EI,  II 
only  handled  a  depth  of  embedding  of  one,  but  in  the  last 
section  this  was  generalized  to  any  depth.)  Indeed,  that 
method  can  prove  all  the  quantifier  theorems  from  Chapter 
III  of  Kalish  &  Montague,  and  almost  all  those  from  Chapter 
IV.  The  ones  it  cannot  prove  run  up  against  the  fourth  stage 
of  the  UI/EI  problem. 

Unfortunately,  the  way  indicated  in  the  last  few 
sections  is  not  the  only  manner  that  the  infinite  sequences 
of  UIs-EIs  can  be  brought  about.  For  example,  there  might  be 
two  premises 

1  .  (Ax) (Ey )Fxy 

2 .  (Ax ) (Ey )Gxy 

(or  these  might  have  resulted  from  the  assumption  of  a 
conjunctive  antecedent).  In  these  cases,  1  and  2  will  have  a 
null  A-list.  If  z9  is  already  in  the  proof,  they  will  be 
instantiated  to 

3.  ( Ey ) Fz  9y 

4.  (Ey)Gzgy 

which  gives  rise  to  Els  to  new  variables 

5.  FZgZg 

6  .  Gz  g  Z  7 

where  z8  is  on  line  1's  P-list  and  z7  is  on  line  2's  P-list. 
But  since  neither  is  z8  on  line  2's  P-list  nor  z7  on  line 
1's  P-list,  we  can  do  more  UI s  to  get 

7.  ( Ey ) Fz  7y 

8.  ( Ey ) Gz  8y 
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which  leads  to  new  Els  with  new  variables 

9.  Fz7z6 

10.  Fz8z5 

where  z6  is  on  line  I's  P-list  and  z5  is  on  line  2's  P-list, 
but  not  the  reverse.  Hence  we  can  do  more  UIs  followed  by 
Els,  etc. 

It  seems  here  that  once  a  variable  is  on  one  of  the 
P-lists  it  ought  to  be  on  all  the  P-lists.  But  this  is  too 
strong,  for  if  no  variable  is  on  a  formula's  P-list,  a  UI 
ought  to  be  allowed  to  any  variable,  regardless  of  whether 
it  is  on  someone  else's  P-list.  That  is,  "innocent" 
universally  quantified  lines  (with  no  P-list)  should  not  be 
restricted  in  what  they  can  do;  only  those  universally 
quantified  lines  that  have  shown  themselves  to  be 
"untrustworthy"  (have  something  on  their  P-list)  should  be 
suspect . 

So  a  radical  restructuring  was  done.  A-lists  were 
constructed  as  before,  but  rather  than  have  a  separate 
P-list  for  each  formula,  one  common  P-list  was  maintained. 
And  associated  with  each  (universally  quantified)  formula 
was  a  boolean  that  indicated  whether  it  had  ever  contributed 
to  the  common  P-list.  If  it  had,  then  no  UI  to  any  member  of 
that  list  was  permitted  as  a  value  of  a  UI  (until  it  came 
time  to  do  a  FINDUI PROHI B ) . 
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I .  The  Logic  of  Set  Theory 

Some  proofs  in  set  theory  can  be  formulated  directly 
within  first  order  logic  without  identity.  We  shall  do  this 
by  allowing  ’F2'  to  stand  for  "is  a  member  of".  Russell’s 
paradox  can  be  put  "there  is  no  set  which  contains  exactly 
those  sets  that  are  not  members  of  themselves."  This  can  be 
translated  as: 

"’(Ex)  (Ay)  (Fyx^-^Fyy ) 

THINKER  proves  this  quite  easily  (see  Appendix  I).  Since 
"the  Russell  set”  cannot  exist,  it  follows  that  if  there  is 
a  set  of  things  all  of  whose  members  are  members  of 
themselves  ("the  anti-Russell  set”),  then  not  every  set  can 
have  a  complement.  I.e., 

( Ey )  ( Ax  )  ( Fxy«~»-Fxx  )  -»  -1  ( Ax )  (Ey )  ( Az )  ( Fxy^-^Fzx ) 

Modern  set  theory  replaces  the  unrestricted  comprehension 
axiom  (that  every  property  determines  a  set)  with  a 
restricted  version:  given  a  set  z,  there  is  a  set  all  of 
whose  members  are  drawn  from  z  and  which  satisfy  some 
property.  Now,  if  there  were  a  universal  set,  then  the 
Russell  set  could  be  formed,  per  impossible .  So  given  the 
restricted  comprehension  axiom  there  is  no  universal  set. 

(Az)  (Ey)  (Ax)  (Fxy^  (Fxz&-Fxx  )  )  -»  -(Ez)(Ax)Fxz 
Next,  call  a  set  x  circular  if  it  is  a  member  of  a  set  z 
which  in  turn  is  a  member  of  x.  Intuitively,  all  the  sets 
are  non-circular,  but  if  we  could  pick  out  the  class  of  the 
non-circular  sets  we  could  thereby  pick  out  the  universal 
set.  Hence  there  can  be  no  class  consisting  of  exactly  the 
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non-circular  sets 

-1  (Ey )  (Ax)  (Fxy<--*-'  (Ez)  (Fxz&Fzx)  ) 

A  final  set  theoretic  problem  is  to  prove  that  set  identity 
is  symmetric,  given  the  definition  of  set  identity  in  terms 
of  having  all  the  same  members.  Again  let  F  stand  for  "is  a 
member  of"  and  let  E:  a  is  identical  to  b. 

(Ax)  (Ay)  (Exy^(Az)  (Fzx^Fzy)  )  \-  ( Au )  ( Av )  ( Euv«~*Evu ) 

This  problem  is  reported  by  de  Champeaux  (1979:  195)  as 
being  unsolvable  by  his  system.  THINKER  proves  all  these 
theorems  handily.  The  proofs  are  listed  in  Appendix  I. 


J.  Andrew's  Challenge 

According  to  de  Champeaux  (1979:  196),  Peter  Andrews 
posed  the  following  problem  at  the  fourth  workshop  on 
automated  deduction  at  Austin,  Texas,  Feb.  1979. 40 
[  (Ex)  (Ay)  (Px^Py  )«-■*(  (Ex)Qx^(Ay)Py)  ] 

[  (Ex)  (Ay)  (Qx«-*Qy  )«--►(  (Ex )  Px<~*  ( Ay  )Qy )  ] 

THINKER  breaks  this  problem  into  six  subproblems: 

A.  cases  where  antecedent  is 

[  (Ex)  (Ay)  (Px«--*Py  )«--►(  (Ex)Qx^->(Ay  )Py  )  ] 

a.  and  the  sub-antecedent  is  (Ex )  ( Ay )  (Qx«~*Qy ) 

i)  show  (Ex)Px^(Ay)Qy 

ii)  show  ( Ay )Qy^ ( Ex ) Px 


4  °The  problem  as  reported  by  de  Champeaux  contains  a  number 
of  apparently  typographical  errors.  There  are  so  many  of 
them  that  it  makes  one  think  his  theorem  prover  has  not 
actually  proved  it,  contrary  to  his  claim.  (And  there  is 
also  the  fact  that  his  system  couldn  t  prove  the  set 
identity  problem  of  the  last  section  which  is  considerably 

easier .  ) 
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b.  and  the  sub-antecedent  is  (Ex  )Px^-»(Ay  )Qy 
i)  show  (Ex)  (Ay )  (Qx^Qy ) 

B.  cases  where  antecedent  is 

( Ex  )  ( Ay  )  ( Qx^+Qy  )  «--►  (  ( Ex )  Px«-+  ( Ay  )  Qy ) 

a.  and  the  sub-antecedent  is  (Ex)  (Ay)  (Px^Py) 

i)  show  (Ex  )Qx-*  ( Ay )  Py 

ii)  show  (Ay )Py^(Ex)Qx 

b.  and  the  sub-antecedent  is  (Ex)Qx«-->(Ay)Py 
i)  show  (Ex)  (Ay )  (Px^Py ) 

Let  us  just  take  a  look  at  one  of  these,  problem  Aai. 
THINKER  assumes  the  main  and  subsidiary  antecedents 

1  .  ( Ex  )  ( Ay  )  ( Px^+Py  )«--►(  ( Ex  )  Qx<-^  ( Ay )  Py  ) 

2  .  (Ex  )  (Ay )  (Qx«~»>Qy ) 

3.  (Ex)Px 

and  wants  to  show  that  everything  is  a  Q.  This  obviously 
follows  by  the  following  chain  of  reasoning. 

2  says  that  either  everything  is  a  Q  or  nothing  is. 
If  the  former,  THINKER  is  done.  So  we  need  but  show 
that  the  second  alternative  is  impossible.  The  left 
hand  side  (LHS)  of  1  says  that  either  everything  is 
a  P  or  nothing  is.  Given  3,  if  the  LHS  is  true,  it 
follows  that  everything  is.  Hence  if  the  LHS  is 
true,  then  (Ex)Qx;  so  the  "second  alternative"  is 
impossible.  Now,  what  if  the  LHS  is  false,  i.e., 
some  but  not  all  are  P's?  In  that  case  the  RHS  must 
also  be  false,,  so  exactly  one  of  (Ex)Qx  and  (Ay)Py 
is  false.  But  given  that  not  all  are  P's,  it  follows 
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that  the  one  which  is  false  is  (Ay)Py.  So  (Ex)Qx  is 
true,  and  again  the  "second  alternative"  is 
impossible . 

Similar  reasoning  holds  for  all  the  cases.  In  Appendix  I 
there  are  proofs  of  each  of  the  six  subproblems.  (The  entire 
problem  is  too  long  to  be  conveniently  presented  as  a 
whole )  . 


K.  Schubert's  Steamroller 

In  1978,  Lenhart  Schubert  presented  the  following 
problem  to  J.  Siekmann  of  Universitat  Karlsruhe  to  test  the 
graph-theoretic  resolution  prover  of  Siekmann  and  his 
associates.  (See  Chapter  III  for  discussion  of 
graph-theoretic  resolution  provers).  This  prover  appears  to 
be  among  the  most  advanced  theorem  prover  in  existence,  but 
was  unable  to  prove  this  argument.41  An  English  version  of 
the  argument  is: 

Wolves,  foxes,  birds,  caterpillars,  and  snails  are 
animals,  and  there  are  some  of  each  of  them.  Also 
there  are  some  grains,  and  grains  are  plants.  Every 
animal  either  likes  to  eat  all  plants  or  all  animals 
much  smaller  than  itself  that  like  to  eat  some 
plants.  Caterpillars  and  snails  are  much  smaller 
than  birds,  which  are  much  smaller  than  foxes,  which 
in  turn  are  much  smaller  than  wolves.  Wolves  do  not 
like  to  eat  foxes  or  grains,  while  birds  like  to  eat 
caterpillars  but  not  snails.  Caterpillars  and  snails 
like  to  eat  some  plants.  Therefore  there  is  an 
animal  that  likes  to  eat  a  grain-eating  animal. 

Symbolized,  using  the  following  abbreviations 


4 ’It  is  not  known  what  the  current  status  of  the  Karlsruhe 
theorem  prover  is. 
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P0:  3  is  an  animal 

P,  :  a  is  a  wolf 

P2 :  a  is  a  fox 

P3 :  a  is  a  bird 

P4:  a  is  a  caterpillar 

P5 :  a  is  a  snail 

Q0 :  a  is  a  plant 

Q,  :  a  is  a  grain 

S:  a  is  much  smaller  than  b 
R:  a  likes  to  eat  b 


the  argument  becomes 


( Ax )  ( P ,  x-*P  o  x  )  &  ( Ex  )  P ,  x 
( Ax  )  ( P  2  x-*P  o  x  )  &  ( Ex  )  P  2  x 
( Ax  )  ( P  3  x^-P  o  x  )  &  ( Ex  )  P  3  x 
(Ax)  (P4x->P0x)  &  (Ex)P4x 
( Ax ) ( P5 x^P0 x )  &  ( Ex ) P  5  x 
( Ex ) Q , x  &  ( Ax ) (Q, x^QoX ) 

(Ax)(P0x  ->  [  ( Ay  )  (Q0y->Rxy  )  + 

(Ay ) ( (P0y&Syx& (Ez ) (Q0 &Ryz ) )+Rxy ) ] ) 
(Ax)  (Ay)  (  (P3y&(P5x  +  P4x)  )  Sxy) 

(Ax)  (Ay)  (  ( P3 x&P2y )  Sxy) 

(Ax) (Ay) ( ( P  2  x&P , y )  Sxy) 

(Ax)  (Ay  )  [  (P,  x& (P2y+Q,  y )  )  ->  -Rxy] 

(Ax ) (Ay ) ( (P3x&P4y )  ^  Rxy) 

( Ax ) ( Ay ) ( ( P3 x&P5y )  ->  -’Rxy) 

( Ax  )  (  ( P  4x  +  P 5  x  )  -*  (Ey  )  (Q0y&Rxy  )  ) 

I”  (Ex)  (Ey  )  (P0x&P0y  &  (Ez  )  (Q,  z&Ryz&Rxy  )  ) 


That  the  argument  is  valid  can  be  seen  by  this  reasoning: 


First,  let's  try  to  find  a  grain-eating  animal. 
According  to  a  premise,  it's  not  a  wolf.  But  since 
grains  are  plants,  it  follows  that  wolves  do  not 
like  to  eat  all  plants.  So,  by  a  premise,  they  must 
like  to  eat  all  animals  much  smaller  than  themselves 
that  like  to  eat  some  plants.  Since  they  also  don't 
like  to  eat  foxes,  it  follows  that  foxes  do  not  eat 
any  plants,  and  hence  don't  eat  any  grains.  As  far 
as  caterpillars  and  snails  go,  all  we  are  told  is 
that  they  like  to  eat  some  plants;  but  we  don't  know 
about  grains,  and  we  can’t  figure  it  out  since  we 
are  not  told  whether  any  animals  are  smaller  than 
caterpillars  or  snails.  This  leaves  the  birds,  and 
sure  enough,  since  they  do  not  like  to  eat  snails 
(which  like  to  eat  some  plants)  it  follows  that  they 
must  like  to  eat  all  plants.  So  now  we  have  to  find 
an  animal  that  likes  to  eat  birds.  Is  it  the  wolf? 
Well,  since  wolves  don't  like  to  eat  all  plants,  it 
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follows  that  they  like  to  eat  all  animals  much 
smaller  than  themselves  that  like  to  eat  some 
plants.  Are  birds  such  animals?  Certainly  birds  like 
to  eat  some  plants,  but  are  they  much  smaller  than 
wolves?  We  can't  tell.  All  we  know  is  that  birds  are 
much  smaller  than  foxes,  which  are  much  smaller  than 
wolves;  but  we  are  not  given  the  transitivity  of 
"much  smaller  than".  So  we  cannot  prove  that  wolves 
like  to  eat  birds.  Is  it  the  fox?  Well,  we’ve 
already  proved  that  foxes  do  not  like  to  eat  any 
plants,  so  they  must  like  to  eat  all  animals  much 
smaller  than  themselves  which  do  like  to  eat  some 
plants.  Again,  a  bird  is  an  animal  that  likes  to  eat 
some  plants,  and  we  are  also  given  that  birds  are 
much  smaller  than  foxes.  Thus  foxes  like  to  eat 
birds.  Since  there  are  foxes  and  birds,  it  follows 
that  there  is  an  animal  which  likes  to  eat  a 
grain-eating  animal.  QED 


L.  Can  THINKER  Prove  the  Steamroller? 

THINKER  can  prove  the  Steamroller,  at  least  in  theory. 
However,  due  to  time  and  space  limitations  it  has  not  yet 
succeeded.  It  is  of  some  interest  to  see  why  it  can  and  why 
it  hasn't,  for  this  failure  sheds  some  light  on  a  direction 
to  further  develop  THINKER  in  the  future.  (Such  a 
development  is  discussed  in  the  next  Chapter). 

First  let's  look  at  some  theory  about  how  proofs 
develop  in  Kalish  &  Montague.  I  wish  to  show  that  even  when 
a  provable  (sub)proof  is  "started  incorrectly",  it  can 
nonetheless  be  completed.  For  example,  it  is  already  built 
into  the  "proof  completion"  rules  that 

show  (P-*Q) 

P  ASSUME 

S 

_IS 

allows  one  to  box  and  cancel,  even  though  it  was  started 
incorrectly"  by  as-suming  an  antecedent  of  the  conditional 
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whereas  it  finished  by  finding  a  contradiction.  The  reason 
for  allowing  this  is  because,  given  such  a  sequence  of 
lines,  there  is  another  proof  which  ’’starts  in  the  same  mode 
that  it  ends."  For  the  above,  we  have 


*  show  (P->Q) 

P  ASSUME 

S 

"’S 

*  show  Q 


"■Q 

S 

-s 


ASSUME 

Repeat 

Repeat 


the  outside  block  of  which  starts  by  assuming  an  antecedent 
and  ends  by  proving  the  consequent,  while  the  inside  block 
starts  by  assuming  the  negation  and  ends  by  finding  a 
contradiction.  All  combinations  of  starting  and  ending 
proofs  in  Kalish  &  Montague  have  the  property  that  if  a 
subproof  is  started  in  one  way  and  ended  in  another,  then 
there  is  a  proof  which  starts  in  the  same  manner  it  ends. 
Since  this  is  so,  Kalish  and  Montague  merely  allow  any 
combination  of  "starts  and  ends." 

In  a  similar  vein,  suppose  an  indirect  subproof  for  a 
provable  formula  has  been  started,  so  that  there  is  a 
contradiction  lurking  somewhere.  And  now  suppose  that  an 
"extraneous"  show  line  is  added  to  the  proof.  Finally, 
suppose  that  we  now  discover  the  contradiction.  Finding  it 
of  course  only  allows  boxing  and  cancelling  up  to  the 
"extraneous"  show  line.  (Recall  that  one  cannot  enclose  an 
uncancelled  show  in  a  box).  Can  we  recover  from  the  mistake 
of  adding  the  extraneous  show  line?  The  answer  is  yes. 
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We  suppose  that  an  indirect  proof  has  been  started  and 
that  we  could  derive  a  contradiction  from  this  but  have 
instead  entered  the  extraneous  "show  0"  line. 


show  P 

“•P  --start  the  indirect  proof 

show  0  --extraneous  show  line 

• 

Q 

■'Q  --derive  a  contradiction 

What  should  be  done  now  is  box  and  cancel  the  0  show  line, 
and  then  re-derive  the  contradiction.  By  hypothesis  this  can 
be  done:  since  Show  0  is  superfluous,  nothing  depends  on  it 
(or  on  any  assumptions  it  generates)  to  find  Q  and  _iQ.  If  it 
were  required,  it  wouldn’t  be  superfluous.  All  that  is 
needed  is  that  the  contradiction  be  lurking  prior  to  the 
Show  0  line. 

A  simple  generalization  of  this  is  when  a  show  line  is 
required  but  the  wrong  one  is  written  down.  In  the  above 
example,  suppose  we  need  to  write  down  Show  ^ ,  but  instead 
write  Show  0.  This  too  can  be  remedied 


start  the  indirect  proof 

extraneous  show  line 


show  P 
-P 


show  0 
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show  tli  --correct  show  line 

Now  by  hypothesis  Show  i|i  is  provable  and  what  is  needed  in 
the  subproof  of  Show  P.  (It  is  needed  because,  for  example, 
it  contradicts  some  line  between  Show  P  and  Show  0,  or 
because  P  is  a  conditional  and  i|i  is  its  consequent,  etc.)  In 
any  of  these  cases,  the  proof  could  continue  by  cancelling 
Show  ill,  repeating  the  reason  it  was  necessary  (e.g.,  the 
line  it  will  contradict),  using  this  to  cancel  Show  0,  and 
then  re-setting  the  goal  Show  ill ,  this  time  not  within  the 

Show  0  subproof.  We  now  prove  di  and  this  time  it  will  cancel 

Show  P.  Suppose  the  reason  to  set  Show  iii  is  that  it  will 

contradict  "’ll)  which  is  a  line  between  Show  P  and  Show  0.  We 

then  have  the  lines 

*show  P 

~*P 

-ill 

*show  0 

*show  ill 

-ill 

*show  iji 


--superfluous 
--provable,  by  hypothesis 

--Repeat 

--reset  correct  goal 


In  other  words,  if  Show  0  is  truly  superfluous,  and  if  the 
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correct  show  line  (here:  Show  i|i )  can  be  found,  then  even 
though  we  started  incorrectly  by  writing  Show  0,  this  can  be 
overcome.  One  does  it  by  proving  (the  correct)  Show  i| j  twice 
once  in  the  scope  of  (the  incorrect)  Show  0  subproof  and 
once  afterwards.  This  is  at  the  cost  of  some  extra  writing, 
but  can  always  be  done.  Furthermore,  no  matter  how  many 
superfluous  show  lines  are  written,  so  long  as  the  correct 
one  eventually  comes  up,  this  strategy  will  eventually  yield 
a  proof. 

This  sort  of  situation  happens  in  THINKER'S  attempted 
proof  of  the  Schubert  Steamroller.  There  are  six  existential 
premises  here,  so  THINKER  does  six  Els  (to  six  different 
variables,  call  them  a,  b,  c,  d,  e,  f).  It  now  does  a  number 
of  UI s  on  the  other  premises  and  some  number  of  MPs.  Since 
these  are  done  to  all  six  of  our  variables,  there  are  quite 
a  large  number  of  lines  left,  differing  from  one  another 
only  in  what  variables  they  contain.  In  particular,  note 
that  the  following  types  of  lines  are  generated.  From 
assuming  the  negation  of  the  conclusion,  doing  QNs  and  UIs, 
we  have 

(1)  -(P0a  &  P0/3  &  Q,y  &  R£y  &  Ra£) 

for  all  combinations  of  a,  b,  c,  d,  e,  f  substituted  for  a, 
/?,  and  y.  (So  an  indirect  proof  has  been  started  and  there 
is  a  contradiction  lurking  somewhere).  The  other  type  of 
lines  which  are  of  interest  have  have  the  form 

(2)  (Ay) (Q0  y+Ray )  +  (Ay) ( (P0y&Sya& (Ez ) (Q0z&Ryz) )+Ray) 

from  a  premise  (for  each  of  our  six  variables  replacing  a). 
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As  it  turns  out,  the  most  efficient  strategy  is  to  set 
the  unnegated  correct  instance  of  (1)  as  a  goal  and  then  as 
a  subgoal  set  the  negation  of  a  disjunct  from  the  correct 
instance  of  (2),  using  the  CHAINING  strategy.  In  particular, 
if  b  is  the  variable  generated  by  El  from  (Ex)P2x  (the  fox) 
and  c  is  the  variable  generated  by  El  from  (Ex)P3x  (the 
bird)  and  f  is  the  variable  generated  from  (Ex)Q,x  (the 
grain),  we  want  to  set 

show  (P0b  &  P0c  &  Q  f  &  Ref  &  Rbc ) 
as  a  goal.  When  it  is  proved,  it  will  cancel  the  higher  goal 
by  contradicting  the  appropriate  instance  of  (1).  The 
SEARCHNEGS  strategy  finds  negations  of  formulae  like  (1), 
but  unfortunately  it  does  not  always  get  the  correct 
instance  first.  However,  as  I  showed  above,  even  if  it 
doesn't  get  the  correct  instance  first,  as  long  as  it 
eventually  gets  it  and  proves  it,  the  proof  will  succeed. 

Will  THINKER  ever  get  the  correct  instance?  And  if  it 
does,  will  it  then  (after  proving  the  superfluous  subgoal) 
turn  around  and  re-set  the  correct  instance  as  a  goal  if  it 
needs  it?  The  answer  to  both  questions  is  yes,  but  it  will 
take  a  long  time.  As  long  as  the  proof  has  not  been 
completed,  and  so  long  as  the  unnegated  formula  is  not 
already  a  goal,  FINDNEGS  simply  keeps  adding  these  formulae 
to  the  goal  stack  until  there  are  no  more  to  be  added.  So 
eventually  the  correct  one  will  be  added.  Suppose  now  that 
the  correct  instance  is  added  and  proved,  and  thus  the 
superfluous  instance  is  proved.  It  turns  out  that  in  the 
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case  of  FINDNEGS  there  is  no  need  to  re-set  the  goal.  For, 
having  proved  even  an  irrelevant  instance  of  (1),  it  can 
cancel  the  next  higher  (irrelevant)  goal  and  so  on,  up  to 
the  point  where  the  indirect  proof  was  begun.  This  is  so 
because  each  of  these  goals,  whether  relevant  or  not,  was 
set  on  the  grounds  that  proving  it  would  allow  an  immediate 
cancellation  of  the  next  higher  goal  (due  to  a 
contradiction).  Thus  in  the  case  that  the  irrelevant  goals 
were  set  by  SEARCHNEGS,  the  proof  of  one  (the  correct,  but 
embedded)  goal  suffices  to  prove  them  all. 

The  matter  is  a  bit  complicated  in  the  Steamroller 
case,  because  there  are  so  many  incorrect  goals.  To  see  what 
will  happen,  consider  the  case  of  two  "wrong"  and  one 
"correct"  instance  of  (1),  where  the  two  wrong  goals  are  set 
first.  Call  the  wrong  instances  -'ll  and  -*1 2  ,  and  the  correct 
instance  ~,1 3 .  THINKER  would  proceed  as  follows 


'I  i 

lI2 

■I3 


*show  I i 
*show  I 2 


*show  I 3 

We  now  suppose  it  can  prove  1 3  and  thereby  cancel  I2,  and 
then  THINKER  will  consider  whether  it  needs  to  re-set  I3  as 
a  goal.  It  doesn't  need  to: 
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“’I  i 
"'1 2 
“1I  3 

• 

*show  I , 

*show  I 2 
*show  I 3 

-■I  3  --Repeat 

nI 2  --Repeat 

We  see  here  that  the  next  higher  show  can  now  be  boxed,  due 
to  the  presence  of  -,I  ,  and  I  ^  .  With  the  SEARCHNEGS  strategy 
any  of  the  instances  proved  will  allow  continuous 
cancellation  of  goals  "upward".  All  that's  required  in  the 
present  case  is  to  prove  I3,  no  matter  how  far  embedded.  How 
far  is  it  embedded?  Well,  given  six  variables,  and  taking 
them  three  at  a  time  (with  repetition)  as  indicated  by 
formula  (1),  we  have  63  (=  216)  instances  of  (1).  One  of 
these  is  the  correct  instance. 

As  it  turns  out,  however,  to  prove  the  correct  instance 
we  need  to  do  CHAINING  by  using  formula  (2).  We  need  to  set 
the  negation  of  a  disjunct  of  the  correct  instance  of  (2)  as 
a  goal.  The  actual  subproof  here  is  rather  tedious,  but  in 
outline  it  goes  like  this. 
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m.  *show  (Pob&PoC&Q, f &Rcf &Rbc )  — correct  (1) 


n.  *show  "’(Ay)  (Q0y^Rby  ) 
(Ay)  (Q0y-*Rby ) 


--correct  disjunct  of  (2) 
--ASSUME 


p.  (Ay) (P0&Syb&(Ez) (Q„ z&Ryz )+Rby )  --lines  n,(2)  MTP 


Of  course,  there  are  doubtless  superfluous  show  lines 
between  m  and  n  --  all  those  incorrect  instances  of  (2)  that 
THINKER  finds  before  n.  And  one  might  ask,  since  the  case  is 
different  than  it  was  before  with  SEARCHNEGS ,  whether  it  is 
always  guaranteed  that  an  eventual  embedded  box  and  cancel 
always  allows  upward  cancellation.  After  all,  one  asks,  on 
each  level  we  merely  use  the  embedded  cancellation  to 
generate  an  MTP.  What  guarantee  is  there  that  the  result  of 
MTP  on  a  superfluous  instance  will  lead  to  a  cancellation  of 
a  higher  show  line?  Suppose  then  we  have  such  a  case  -- 
suppose  there  is  one  superfluous  instance  of  the  (2)  show 
line  before  we  hit  upon  the  correct  instance.42  Let  I  be  the 
correct  instance  of  (1),  W  be  the  wrong  instance  of  (2),  C 
be  the  correct  instance  of  (2),  and  W'  and  C '  be  the  results 
of  MTP  with  W  or  C  on  the  appropriate  instance  of  (2).  We 
then  have 

m.  show  I  — correct  ( 1 ) 

m’ .  show  W  — incorrect  (2) 


4  2There  are  a  total  of  six  instances  of  (2). 
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n.  *show  C  --correct  (2) 

P«  c’  ~n,  (2)  MTP 

Now ,  by  hypothesis  the  proving  of  the  correct  instance  of 
(2)  leads  to  a  MTP  (our  line  p)  which  would  eventually 
cancel  line  m,  that  which  we  wish  to  prove.  (Because  an 
indirect  proof  has  been  started^ .  Here  we  use  it  to  cancel 
line  m '  instead.  Proving  line  m'  now  allows  us  to  do  an  MTP 
with  the  appropriate  instance  of  (2). 


m.  show  I 


--correct  ( 1 ) 


m 


n . 


*show  w 


*show  C 


--incorrect  (2) 


--correct  (2) 


— n ,  (2)  MTP 


p?  .  W?  — n,  (2)  MTP 

Now,  is  there  any  guarantee  that  from  p'  we  can  eventually 
cancel  m?  Yes:  all  that  is  required  is  to  re-set  C  as  a 
goal.  This  can  be  done  since  it  is  no  longer  on  the  goal 
stack,  and  (unless  THINKER  succeeds  without  it)  it  wi 77  be 
done  since  THINKER  eventually  places  all  possible  CHAINING 
goals  on  the  goal  stack.  And  by  hypothesis  the  proof  of  C 
allows  a  cancellation  of  I. 
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As  can  be  seen,  this  process  can  become  very  messy 
indeed.  Unlike  the  case  with  SEARCHNEGS ,  where  the 
cancellation  of  any  goal  allows  the  cancellation  of  them  all 
(because  they  were  all  generated  as  negations  of  lines 
currently  antecedent),  in  CHAINING  here  the  cancellation  of 
a  goal  merely  allows  the  proof  to  proceed  further.  And  if 
the  "further  proceeding"  is  up  a  blind  alley,  THINKER  needs 
to  return  to  the  embedded  correct  goal  for  help;  it  needs  to 
re-set  that  goal  outside  the  scope  of  the  superfluous  goal, 
prove  it  again,  and  use  this  to  prove  the  next  higher  goal. 
Given  that  THINKER  finds  these  goals  by  going 
(deterministically)  around  the  +-ring  looking  for  formulae 
of  the  form  (0  +  tli),  it  will  always  re-set  its  goals  in  the 
same  order. 

This  is  unfortunate  if  there  are  a  number  of 
superfluous  goals  set.  Consider  the  following  where  instead 
of  one  superfluous  goal  there  are  three.  (I  use  W,,  W2 ,  W3 
for  these,  and  w;  etc.  for  the  result  of  performing  MTP  with 
the  appropriate  instance  of  (2).  The  example  is  on  the  next 
page.)  The  preceding  discussion  explains  how  line  m2  gets 
cancelled:  the  n...p  correct  instance  proof  cancels  m3,  and 
then  is  repeated  as  n, . . .pi ,  which  cancels  m2 .  m2  is  used 
with  its  corresponding  instance  of  (2)  to  generate  q2. 
However,  m2  was  superfluous,  so  q2  will  not  yield  a 
contradiction  needed  to  cancel  the  next  higher  goal.  THINKER 
needs  to  reset  C  as  a  goal.  But  before  it  can  do  that,  it 
must  set  (the  superfluous)  W3  as  a  goal,  since  W3  was 
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m.  show  I 


m,  *show  W, 


m2 


m3 


n . 


q3 


n  i 


Pi 


q2 


n 


P  2 


q3 


n 


P3 


♦  show  W- 


♦show  W 


♦show  C 


C’ 


w; 


♦show  C 


C' 


w; 


♦show  W 


♦show  C 


C? 


w; 


♦show  C 


C' 


— correct  ( 1 ) 

--incorrect  (2) 
--incorrect  (2) 
--incorrect  (2) 
--correct  (2) 

— n,  (2)  MTP 
— m3 , (2)  MTP 
--reset  correct  (2) 
-n,,  (2)  MTP 

— m2 , (2)  MTP 
--reset  W3  as  goal 
--reset  correct  goal 
— n2 , (2)  MTP 
—  r  3 , ( 2 )  MTP 
--reset  correct  goal 
— n3, (2)  MTP 


assumed  to  "come  before"  C  in  the  +-ring  and  is  also  not 
antecedent  at  this  stage.  So  now  the  proof  continues  with  W3 
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reset  as  a  goal,  and  then  C  is  reset  as  a  goal.  Of  course, 
having  the  correct  goal  (line  n2)  within  the  scope  of  an 
incorrect  goal  (line  q2)  means  that  the  correct  goal  has  to 
be  reset  once  again  outside  its  scope.  And  now  the  (correct) 
proof  between  n2...p2  allows  cancellation  of  the  next  higher 
goal,  in  this  case  W1#  This  is  as  far  as  has  been 
illustrated  in  the  schematic  proof:  line  m1  has  been 
cancelled.  Line  m1f  however,  is  another  superfluous  goal. 
Thus  in  order  to  cancel  line  m,  as  we  wish,  we  need  to  reset 
C  as  a  goal.  But  we  cannot  do  this  until  W2  and  W3  are  set 
as  goals. 

To  make  a  long  story  shorter,  now  that  W,  is  cancelled, 
W2  is  set  as  a  goal  at  the  same  level  of  embedding  as  W1# 
This  sets  W3  and  then  C  as  subgoals,  eventually  cancelling 
W 2,  essentially  by  redoing  the  part  of  the  proof  between 
lines  m2  and  q2.  However,  W2  is  also  a  superfluous  goal,  so 
the  "next"  goal  (W3)  is  set  as  a  goal  at  the  same  level  of 
embedding  as  W ,  and  W2.  This  in  turn  sets  C  as  a  subgoal, 
eventually  cancelling  W3,  by  redoing  the  proof  between  r3 
and  q3 .  But  W3  is  also  superfluous,  so  the  next  goal,  C,  has 
to  be  set  at  the  same  level  of  embedding  as  W1f  W2,  and  W3. 
When  this  succeeds,  THINKER  can  finally  cancel  line  m,  the 
correct  instance  of  ( 1 ) . 

In  general  then,  CHAINING  finds  goals  in  a  certain 
order.  If  superfluous  goals  are  found  first,  an  embedding  of 
the  goals  takes  place  until  the  correct  one  is  found. 

Finding  the  correct  one  allows  a  cancellation  of  the  most 
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deep  incorrect  one,  and  then  a  repeat  of  the  correct  one,  to 
cancel  the  next  higher  incorrect  one.  But  then  the  deeper 
incorrect  one  is  reset,  embedding  the  correct  one,  proving 
it,  resetting  the  correct  one  and  proving  the  deepest 
incorrect  one,  and  resetting  the  correct  one  and  proving  the 
next  deepest  incorrect  one.  But  then  the  embedded'  incorrect 
ones  need  to  be  reset.  This  continues  until  all  goals  up  to 
and  including  the  correct  one  (in  the  +-ring  ordering)  are 
cancelled  at  the  same  level  of  embedding. 

Suppose  the  subproof  of  C  plus  the  MTP  and  finding  the 
contradiction  takes  i  steps,  and  suppose  there  are  k 
"pointless"  steps  in  each  superfluous  part  (that  is,  steps 
not  directly  involved  in  an  immediate  subproof),  and  that 
there  are  n  superfluous  subgoals  before  C  in  the  +-ring.  How 
long  will  it  take  to  prove  the  higher  goal?  Well,  to  prove 
the  most  deeply  embedded  incorrect  instance  takes  the  proof 
of  C  together  with  whatever  superfluous  steps  are  required, 
i.e.,  i+k  steps.  The  next  higher  one  has  embedded  that 
subproof  plus  a  subproof  of  C  plus  the  pointless  steps, 
i.e.,  (i+k)+(i+k).  The  third  deepest  embedding  has  that  one 

plus  an  embedding  of  the  subproof  of  the  second  deepest  plus 
the  subproof  of  C  plus  the  pointless  steps.  In  general,  to 
prove  the  very  first  superfluous  goal  takes 
SUM( j=1+j=n)  [ j ( i+k ) ] 

steps.  But  there  are  n  of  these  to  prove,  each  with  one  less 
embedding,  before  we  come  to  C  being  at  the  same  level  as 

to  finally  prove  what  is  required 


the  first.  I.e., 


« 
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necessitates 

SUM(m=  1-»m=n )  SUM(j=1+j=m)  [  j  (  i+k )  ] 
total  steps. 

The  upshot  of  this  is  that  THINKER  in  theory  will 
eventually  find  a  proof  of  the  Steamroller.  But  it  will  not 
be  found  in  any  reasonable  time.  I  said  before  that  there 
were  six  instances  of  (2)..  While  this  is  true,  CHAINING  is  a 
more  general  procedure  than  just  looking  for  (0  +  1I1)  lines:  it 
also  looks  for  conditionals  and  sets  their  antecedents  as 
goals  or  the  negation  of  their  consequents.  In  the 
Steamroller  there  are  an  incredibly  large  number  of 
conditionals.  It  is  therefore  an  extremely  large  problem. 
THINKER  was  given  the  problem  and  allowed  to  run  for  one 
minute  CPU  time  on  an  Amdahl  470  V/8 .  It  generated  2850 
lines  of  proof.  The  proof  was  structured  correctly,  as 
indicated  above  --  the  proper  instance  of  (1)  was  finally 
set  as  a  goal  (at  line  350),  and  shortly  thereafter43 
CHAINING  was  called  and  started  setting  superfluous 
subgoals.  The  first  of  these  had  not  been  proved  by  line 
2850,  although  some  of  the  most  deeply  embedded  ones  were 
proved.  It  seems  clear  therefore  that  THINKER  C3D  prove  the 
Steamroller,  and  would  do  so  given  more  time. 


43  Of  course,  finding  the  proper  instance  of  (1)  does  not 
mean  that  SEARCHNEGS  will  stop  setting  goals.  Since  the 
proper  instance  requires  CHAINING  to  prove  it,  and  since 
CHAINING  is  called  after  all  SEARCHNEGS  are  found,  there  are 
many  more  SEARCHNEG  show  lines  generated  before  the  first  of 
the  CHAINING  show  lines  is  generated.  The  first  CHAINING 
show  line  is  at  line  776. 


VIII.  SOME  DIRECTIONS  FOR  THE  FUTURE 


A.  Improvements  to  the  Proof  Strategy 

There  are  two  areas  in  which  THINKER’S  strategies  are 
not  driven  by  any  assurance  that  the  result  of  applying  the 
strategy  will  aid  in  getting  "closer"  to  the  goal.  One  is  in 
the  FIND-rules,  where  THINKER  applies  the  rules  of  inference 
to  all  antecedent  lines  regardless  of  whether  it  is 
guaranteed  that  they  will  be  useful  lines.44  It  may  be  that 
there  is  some  way  to  recognize  "usefulness"  of  the  possible 
applications  of  the  rules  in  FIND,  and  to  do  the  ones  judged 
"more  useful"  first.  But  inspection  of  the  sample  proofs 
shows  that  the  unbridled  application  of  the  FIND  strategy 
does  not  unduly  lengthen  proofs. 

The  other  place  where  inference  is  not  goal-driven  is 
in  the  TRYCHAINING  strategy.45  Here  THINKER  merely  finds 
some  conditional  (or  disjunction)  amongst  the  antecedent 
lines  for  which  the  consequent  (or  other  disjunct)  is  not 


4 40f  course,  THINKER  only  applies  "structure-reducing  rules" 
in  FIND  --  those  rules  which  make  their  conclusion  "less 
complex"  than  the  most  complex  premise  of  the  rule.  E.g.,  MP 
(from  0  and  ( 0->  il )  infer  i|j  )  is  such  a  rule.  Thus  the  FIND 
strategy  will  terminate  in  a  reasonably  short  time, 
depending  only  on  the  number  and  complexity. of  antecedent 
lines,  for  once  an  antecedent  line  is  used  it  is  not  again 
to  be  used  in  the  same  rule.  Except  for  FINDUI ,  which  is 
handled  separately  as  discussed  in  Chapter  VII,  there  is  no 
rule  which  is  either  used  again  or  increases  structure.  (So 
for  example  the  rule  Adj  (from  0  and  iji  infer  ( 0  &  tl> )  )  is  not  a 
part  of  the  FIND  strategy. 

45The  FINDNEGS  strategy,  on  the  other  hand,  is  goal-driven. 
As  soon  as  one  of  the  generated  show  lines  is  proved,  the 
proof  is  completed  up  to  a  higher  show  level . . I n . the 
discussion  of  the  Steamroller  of  Chapter  VII  it  is  rather 
the  TRYCHAINING  strategy  which  is  unfocused. 
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also  an  antecedent  line,  and  sets  an  appropriate  goal  (the 
antecedent  of  the  conditional,  or  the  negation  of  one  of  the 
disjuncts).  This  strategy  is  useful  in  the  sense  that,  if 
the  goal  is  proved,  then  a  new  MP  (or  MTP )  can  be  performed. 
But  it  is  not  immediately  useful  in  the  sense  that  the  new 
MP  (or  MTP)  will  help  the  proof  of  the  next  higher  goal. 

It  seems  clear  that  it  would  be  desirable  to  first  look 
for  immediately  useful  CHAININGs  before  trying  all 
CHAININGS.  One  type  of  immediately  useful  chaining  is 
BACKCHAINING :  if  i|j  is  the  most  recent  goal,  look  for  (@>-»-ili) 
lines  (or  ( )  lines)  in  the  antecedent  lines  and  if  found 
set  as  new  goal  the  formula  which  is  the  token  of  @>.  If  this 
can  be  proved  then  the  most  recent  goal  can  be  immediately 
cancelled.  Somewhat  more  generally,  if  i|i  is  any  previous 
goal,  look  for  (@>-m|j)  lines  (or  (@>+il))  lines)  amongst  the 
antecedent  lines  and,  if  found,  set  the  formula 
corresponding  to  §  as  a  goal.  If  this  can  be  proved,  then  i|> 
will  be  a  new  antecedent  line  and  (perhaps  after  some  small 
amount  of  further  processing)  all  goals  up  to  and  including 
the  i|i  goal  can  be  cancelled. 

These  alterations  are  extremely  easy  to  implement,  and 
are  currently  being  worked  on.  A  harder-to- implement 
generalization  would  be  the  following.  For  all  goals  iJj  ,  look 
for  (0^0)  antecedent  lines  (or  the  equivalent  disjunction) 
such  that  0  allows  for  a  one  step  proof  of  d) .  If  found,  set 
0  as  a  new  goal.  This  would  aid  proofs  immensely,  but  it 
seems  that  THINKER  would  have  to  produce  little  subproofs  to 
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check  whether  one  step  on  ©  would  yield  iji .  Such  attempts 
would  not  occur  within  the  main  proof  matrix  and  would 
amount  to  "trying  out  a  short  proof  to  see  whether  it  works, 
and  if  not  discarding  the  attempt".  One  of  the  design  goals 
has  been  not  to  allow  THINKER  to  "throw  away"  proofs  once 
started.  Without  this  constraint,  the  actual  work  done  by 
THINKER  remains  hidden  —  we  do  not  know  what  actual  proof 
steps  have  been  taken.  In  the  present  design,  everything 
that  THINKER  does  can  be  inspected  by  looking  at  the  final 
proof  it  produces. 

Even  with  the  addition  of  BACKCHAIN,  there  will  remain 
proofs  that  are  very  difficult  for  THINKER,  for  example,  the 
Steamroller.  As  I  discussed  in  Chapter  VII,  the  problem  here 
is  that  THINKER  sets  up  an  incredibly  large  number  of  goals 
in  the  CHAINING  strategy,  and  it  is  only  some  of  them  that 
are  worthwhile  (although  these  worthwhile  goals  need  not  be 
immediately  useful  in  the  sense  required  by  BACKCHAIN).  What 
we  need  is  to  find  which  CHAIN  to  start  with;  but  since 
there  may  not  be  any  overt  clue  (e.g.,  even  the  one  step 
strategy  suggested  in  the  previous  paragraph  may  not  yield 
enough  information)  as  to  which  CHAIN  to  start  with,  it 
seems  that  we  need  to  check  them  all.  A  suggestion  that 
comes  to  mind  is  to  start  concurrent  (parallel)  processes 
for  each  of  the  possible  CHAINS.  The  first  one  to  prove  a 
goal  generated  by  the  CHAIN  strategy  sends  a  message  to  its 
parent  process  which  in  turn  stops  all  the  other  child 
processes.  Of  course  one  of  these  concurrent  processes  might 
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itself  spawn  a  wide  range  of  child  processes  if  it  should 
come  to  the  point  where  a  new  instance  of  CHAINING  is  called 
for.  Information  between  the  children  of  a  given  process 
probably  should  allow  a  process  to  continue  until  it  adds  a 
line  to  its  proof  matrix,  and  then  pass  control  to  the  next 
sibling  process.  Probably  also,  no  sibling  should  be  allowed 
to  add  a  new  CHAINING  goal  until  all  siblings  are  ready  to 
do  so . 

A  number  of  design  decisions  have  to  be  made  before 
this  is  attempted.  For  one  thing,  there  is  the  problem  of 
sharing  between  parent  process  and  child  process  such  data 
structures  as  ANTELINES ,  but  having  separate  proof  matrices 
(since  the  parent  does  not  know  which  child  is  going  to 
"win").  A  further  problem  is  that  SPITBOL  does  not  support 
concurrent  processing  --  although  the  SNOBOL-based  language 
ICON  does.  Further  work  on  this  area  is  planned  for  the  near 
future . 


B.  Identity  and  Functions 

Two  obvious  extensions  to  THINKER  are  to  extend  the 
system  to  handle  identity  and  arbitrary  function  symbols.  In 
fact,  THINKER  already  recognizes  identity-formulae  as 
well-formed  by  the  formation  rule 

If  a  and  /?  are  terms,  then  a  =  fi  is  a  formula 
However,  there  are  no  heuristics  involving  identity  and  so 
the  only  identity-arguments  that  can  be  proved  are  those 
which  do  not  essentially  involve  identity.  The  rules  I  would 
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propose  are  these: 

\-  (Ax ) x  =  x  (Ref) 

and 

0,cc  =  p  |-  0'  (LL) 

where  0'  comes  from  0  by  proper  substitution  of  some 
occurrence  of  a  in  0  by  £  (a  and  ft  are  terms).46  One  needs, 
however,  some  strategies  as  to  when  to  use  these  rules.  As  a 
first  step,  I  propose  to  implement  Ref  by  adding  an 
additional  check  in  FINDCONTRA:  if  a  line  is  of  the  form 
"*o;  =  a,  then  introduce  a=a  as  a  line  (annotated  'Ref')  and 
immediately  box  and  cancel.  LL  is  a  harder  rule  to 
implement.  One  could  follow  the  "blind"  procedure  FINDUIS 
and  merely  do  all  the  substitutions  that  are  available.  In 
the  end  one  may  have  to  have  such  a  strategy  to  fall  back 
on,  but  one  hopes  that  a  more  efficient  and  controlled  use 
of  LL  is  sometimes  possible.  For  example,  given  that  a=/?  is 
in  the  proof,  one  should  look  for  the  pairs  0a  and  ~'0fl 
(where  0a  and  0/?  differ  only  in  that  one  has  an  occurrence 
of  a  where  the  other  has  occurrences  of  /?).  With  a  somewhat 
fancier  TEMPLATE  mechanism  than  THINKER  now  has,  such  a 
strategy  should  be  implementable .  In  a  similar  vein,  given 
a=/3  we  should  look  for  cases  of  the  form  0a,  (0/3-»il 1)  as  an 
opportunity  to  do  MP  (and  the  like  for  the  other  rules). 

4 ‘These  are  not  the  rules  of  Kalish  &  Montague.  They  give 
these  two: 

0  ’  [-( Aa  )  (  a  =  /?-*0  ) 

( Aa  )  (  a  =  /3->0  )  |-0  ' 

and  present  Ref  and  LL  as  derived  rules.  However,  from  Ref 
and  LL ,  they  in  turn  can  be  derived,  and  so  the  present 
rules  are  adequate. 
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In  theory,  arbitrary  function  symbols  are  easy  to 
implement.  One  characterizes  them  as: 
fi  are  (i-place)  function  symbols 

If  f  is  an  i-place  function  symbol,  and  are 

terms,  then  f (a 1 , . . . , a;  )  is  a  term. 

And  since  these  are  terms,  they  operate  in  a  proof  exactly 
as  any  other  term  (variable  or  constant).  The  real  problem 
is  the  large  number  of  terms  this  will  introduce  into  a 
proof  and  the  concomitantly  large  number  of  possible  UIs 
this  gives  rise  to.  With  this  large  number  of  new  terms, 
however,  the  identity  problems  re-emerge  with  a  vengance, 
especially  in  the  proposed  "blind”  identity  substitution 
procedure . 

These  aspects,  though  tedious,  could  easily  be  grafted 
onto  the  current  version  of  THINKER.  With  such  additions, 
THINKER  could  be  more  easily  compared  to  the  published 
versions  of  other  theorem  provers.  For,  in  these 
publications,  the  main  attempt  has  been  to  prove  certain 
theorems  in  elementary  mathematics  such  as  group  theory, 
ring  theory,  and  semi-group  theory.  (One  adds  the  axioms  of 
the  theory  as  premises  of  the  arguments).  In  Kalish  & 
Montague,  Chapters  8  -  11,  this  method  is  used  to  generate 
various  extensions  of  the  theory.  In  particular,  the 
following  theories  are  developed:  the  theory  of  commutative 
ordered  fields,  the  theory  of  real  numbers,  the  theory  of 
convergence,  differential  calculus,  and  integral  calculus. 
Given  the  ease  with  which  various  theorems  of  these  theories 
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are  proved  in  Kalish  &  Montague,  it  should  not  be  surprising 
if  THINKER,  so  augmented,  also  could  prove  interesting 
theorems  in  these  fields. 

C.  Natural  Language  Processing  and  Other  Areas 

Any  of  the  areas  mentioned  in  Chapter  I  would  be  a 
suitable  test  ground  for  THINKER.  However,  the  area  I  am 
interested  in  is  natural  language  processing.  In  particular, 
I  am  interested  in  the  style  of  grammar  promoted  by  Gerald 
Gazdar  and  his  associates  (see  Gazdar  1981,  Sag  1981,  Gazdar 
et  al  forthcoming;  see  also  Schubert  &  Pelletier  1982  for 
further  details).  In  this  conception  of  grammar,  the 
semantic  component  is  a  (typed)  lambda  calculus  in  the  style 
of  Montague  (1970,  1973).  I  would  like  to  allow  THINKER  to 
accept  formulae  that  are  well-formed  in  the  lambda  calculus 
and  perform  logical  operations  on  them,  i.e.,  construct 
proofs  in  the  lambda  calculus.  Besides  an  expansion  in  the 
set  of  formulae  that  are  recognized  by  THINKER,  the  lambda 
calculus  employs  one  more  rule  of  inference:  lambda 
conversion.  Since  such  a  rule  does  no  more  than  find  an 
equivalent  of  a  formula,  this  is  extremely  easy  to  program. 
Indeed,  I  have  in  mind  that  the  parsing  component  of  the 
natural  language  system  will  produce  unconverted  lambda 
expressions  corresponding  to  English  sentences.  THINKER  will 
perform  the  appropriate  lambda  conversions  and  arrive  at 
more  "natural"  equivalent  expressions.  In  the  Montague 
theory,  these  expressions  are  then  further  reduced  by  means 
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IX.  APPENDIX  I:  THINKER’S  PROOFS  OF  SOME  THEOREMS 

This  Appendix  gives  the  proofs  generated  by  THINKER  for 
a  selection  of  problems  presented  to  it.  The  selection 
comprises  just  those  used  to  illustrate  points  made  in  the 
text  concerning  how  proofs  proceed  and  concerning 
comparisons  with  other  theorem  provers. 

It  should  be  emphasized  that  the  proofs  are  exactly  as 
produced  by  THINKER.  There  has  been  no  postprocessing  of  any 
sort  other  than  conversion  of  internal  representation  to 
printer  format.  (For  example,  the  symbol  which  is  used 

on  output  is  internally  ’>’.  Similar  remarks  hold  for  sub- 
and  superscripts).  A  problem  is  presented  to  THINKER  by 
typing  in 

premise  (  (p+q)-*(  r&s)  ) 

prove  (q-*s) 

for  example.  There  is  no  preprocessing  of  formulae.47  The 
internal  SPITBOL  clock  and  statement  count  have  been 
appended  to  these  proofs  here  by  hand.  (It  turned  out  to  be 
very  difficult  to  format  that  part  of  THINKER'S  output  so  as 
to  stay  within  the  required  margins). 

We  start  with  some  simple  proposit ional ,  logic  problems. 

1  .  |-  (  -’-,p->p) 

2.  |-  (  (  (p+q)+p)->p) 

3.  (q+r),  (  r-*  ( p&q )  )  ,  (p-(q+r))  |-  (p— q) 

4.  (  (p-*q)«-+(  ^q^“'p)  ) 

5.  |-  (  ( p^q )  ( q+P )  ) 

4  7 Problems  with  no  premises  are  just  entered  with  prove 
plus  formula. 
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6.  |-  (p+~*p) 

7.  |-  ( p+^-'-'p) 

8.  f-  (  (  (p^q)^r  )«--►  (p«~+  (q*--*r  )  )  ) 

9.  f-  (  (  (p+q)-*(p+r  )  )+(p+(q^r  )  )  ) 

10.  |-  (  ( _1p-*q)->-  (  "’q+p)  ) 

11.  |-  (  (  (p&q)  +  (p&-*q)  )  +  (  ( -’p&q)  +  ( ->p&-'q)  )  ) 

(1)  is  the  "hardest"  theorem  proved  by  the  "new  Logic 
Theorist".  (5)  is  the  "hardest"  theorem  proved  by  Siklossy 
with  a  breadth  first  search.  (10)  is  the  theorem  judged  by 
Siklossy  to  be  "hardest"  of  the  first  52  theorems  of 
Whitehead  &  Russell  (1910),  and  (9)  his  judgment  of  the 
"hardest"  one  of  the  first  62  theorems.  (4)  is  a 
biconditional  version  of  the  "most  difficult"  problem  proved 
by  the  "original  Logic  Theorist".  (That  is,  the  original 
Logic  Theorist  could  only  prove  one  direction  of  it: 

(  (p->q)-*(  -iq->-ip)  )  .  )  (7)  is  a  problem  of  which  it  has  been 
proved  that  the  "original  Logic  Theorist"  could  not  prove, 
and  (11)  is  a  problem  that  cannot  be  proved  by  unit 
resolution  (nor  therefore  by  input  resolution).  For  details 
of  these  items,  see  Chapter  VII  Section  A.  (2)  is  known  in 
logic  as  "Peirce's  Law"  after  the  19th  century  American 
logician  Charles  Sanders  Peirce.  (3)  illustrates  THINKER'S 
proving  an  argument  with  premises. 

THINKER’S  proof  of  (6)  is  interesting.  Most  elementary 
students,  in  trying  to  prove  any  biconditional,  will  try  to 
show  each  direction  separately.  Here  however,  both 
directions  are  the  same.  THINKER,  unlike  most  elementary 
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students,  notices  this  and  does  not  bother  to  write  a  'show' 
line  for  the  "other"  direction.  (8),  the  associativity  of 
,  is  the  "hardest"  of  the  propositional  theorems  THINKER 
has  been  asked  to  prove.  (It  is  Theorem  95  of  Kalish  & 
Montague ) . 
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The  conversion  to  clausal  form  requires  the  validity  of 
certain  propositional  equivalences  and  certain  quantifier 
equivalences  (about  moving  quantifiers  through  connectives 
to  the  front  of  a  formula).  A  resolution  based  prover  cannot 
prove  these  since  it  assumes  them.  A  partial  list  is  given 
in  Chapter  VII,  Section  C.  Some  of  them  are. 

12.  |-  (  ( P^q )  «-**  (  (q+_,p)  &  ( “’q+p)  )  ) 

13.  |-  (  (p+(q&r)  )«--►(  (p+q)&(p+r ) ) ) 

14.  |-  (  (Ax)  (P°^Qx)^(P°^(Ax)Qx)  ) 

(The  P°  in  14  indicates  any  formula  with  no  free  occurrence 
of  x,  the  variable  of  quantification.) 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1  1 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

4  1 

42 

43 

44 

45 
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low  (  ( p^q )  (  (q+-p)  &  (  -q+p)  )  ) 

cshow  (  (  (q+^p)  &  (  ^q+p)  )-*(p^-*q)  ) 
(  (q+~’p)  &  (  -’q+p)  ) 

*show  (p«"-»q) 

*show  (q-*p) 

q 

*show  p 
"P 

(q+-,p) 

(->q+p) 

P 

*show  (p-*q) 

P 

*show  q 

-q 

(q+_,p) 

( "’q+p) 

-p 

P 

-  P^q ) 

:show  (  (p+--*q)  + (  (q+->p)  &  (  “•q+p)  ) ) 

( p+--*q ) 

*show  ( (q+^p) & ( -q+p) ) 

*show  (q+-p) 

(q+_,p) 

(p^q) 

(q^p) 

*show  q 

-q 

T 

(q+-*p) 

“•  (q+_,p) 
q+-p) 

*show  (--q+p) 

^  ( -•q+p) 

(p^q) 

(q-*p) 

*show  -q 

q 

p 

(  “'q+p) 

-  ( -q+p) 

’q+p ) 

(q+-p)  &  ( -’q+p)  ) 

( p<-*q ) (  (q  +  ^p)  &  ( “’q  +  p )  )  ) 


ASSUME 


ASSUME 


ASSUME 
3 ,  S 
3 ,  S 

10,6, MTP 
ASSUME 


ASSUME 
3 ,  S 
3 ,  S 

15, 16, MTP 
13, R 
12,5, CB 

ASSUME 


ASSUME 
22, BC 
22, BC 

ASSUME 
29,26, MT 
30 , ADD 
25, R 
28 , ADD 

ASSUME 
22, BC 
22, BC 

ASSUME 
37,39 ,MP 
40, ADD 
35, R 
38 , ADD 
24 , 34 ,ADJ 
21 ,2,CB 


:  132  msec 
ts:  9784 


. 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1  1 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 


(  (p+(q&r )  )«--►(  (p+q) & (p+r)  ) ) 
show  (  (  (p+q)&(p+r  )  )+(p+(q&r  )  )  ) 
( (p+q) & (p+r ) ) 

*show  (p+ (q&r ) ) 

-(p+(q&r )  ) 

(p+q) 

(p+r ) 

*show  p 

q 

r 

*show  (q&r) 

|  (q&r) 

- (p+ (q&r )  ) 

( p+ (q&r ) ) 
p+ (q&r ) ) 

show  (  (p+  (q&r  )  )-*•(  (p+q)  &  (p+r  )  )  ) 
(p+ (q&r ) ) 

*show  ( (p+q) & (p+r ) ) 

*show  (p+q) 

-(p+q) 

*show  p 

-p 

(q&r ) 

q 


(p+q) 

- (p+q) 

(p+q) 

*show  (p+r) 
- (p+r ) 
*show  p 
-p 

(q&r ) 

q 


r 

(p+r ) 

- (p+r ) 

(p+r ) 

(p+q) & (p+r ) ) 

( p+  (q&r  )  (  ( p+q)  &  ( p+r  )  )  ) 


ASSUME 

ASSUME 
3 ,  S 
3  r  S 

ASSUME 
9  f 6  rMTP 
9,7, MTP 

10,11, AD J 
5 ,  R 

1 2 , ADD 
8  ,  ADD 

ASSUME 


ASSUME 

ASSUME 

23 . 18 ,  MTP 
24, S 

24. 5 

2  5 , ADD 
21  ,  R 
22 , ADD 

ASSUME 

ASSUME 

33. 18,  MTP 
33, 20, MTP 

34. 5 
36, ADD 
31  ,  R 
32 , ADD 
20,30, ADJ 
1  7 , 2 , CB 


;  121  msec 
ts:  9461 


I  '  '  o  ^  I  '  I  '  ' 

show  ((P“  +  (Ax,  )  0 ,'  (  x  ,  )  )  +  (Ax,  )(  PS+Q }  (  x  )  )  ) 
(P°+( Ax,  )q; (x,  )  ) 

'show  (Ax,  )(P°+0,‘  (x,  )  ) 

I  ’show  (P£+q; (x,  ) ) 
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In  Chapter  7,  Section  D,  (15)  was  discussed  in  detail.  Other 
examples  here  show  THINKER  proving  non-trivial  predicate 
logic  arguments. 

15.  (Ex) (P°+Qx) ,  ( Ex  )  ( Qx-*P 0  )  |-  ( Ex )  ( P°<~*Qx ) 

16.  “■  (Ex  )  (  Sx&Qx  )  ,  ( Ax  )  (  Px->- (Qx+Rx  )  )  ,  -•  ( Ex  )  Px+  ( Ex  )  Qx  , 

( Ax  )  (  (Qx+Rx  )^Sx  )  f-  (Ex)(Px&Rx) 

17.  (Ex)Px,  ( Ax  )  ( S  ,  x-*>  (  "’S  2  x  +  ->Rx  )  )  ,  ( Ax  )  ( Px->  ( S  ,  x&S  2  x  )  )  , 

( Ax  )  (  Px-»Qx  )  +  (Ex  )  (  Px&Rx  )  \-  (Ex)(Qx&Px) 

18.  (Ex  )  Px^->  (Ex  )Qx  ,  (Ax)  (Ay)  (  ( Px&Qx  )-*  ( Rx«-+Sx  )  ) 

|-  ( Ax  )  (Px+Rx  )«-->(  Ax  )  (Qx+Sx  ) 

19.  ( Ax  )  (  Px-*Rx  )  ,  (Ax)  (  (S  ,  x&S2x)^Px)  , 

(Ex  )  (Rx&Qx  )->  ( Ax  )  (  S  ,  x-^Rx  )  f-  ( Ax  )  (  S  2  x-^S  ,  x  ) 

20.  ( Ax  )  P ,  x->  ( Ax  )  Qx  ,  ( Ax  )  (Qx  +  Rx  )-»  (Ex  )  (Qx&Sx  )  , 

( Ex  )  Sx-*  ( Ax  )  ( P  2  x-*P  3  x  )  [-  (Ax)  (  (P1x&P2x)-^P3x) 

21.  f-  [  (  (Ex)Px«~»*(Ex)Qx)&(Ax)  (Ay)  (  ( Px&Qy  )^  (  Sx^Ry  )  )  ]  ■* 

(  ( Ax  )  (  Px+Sx  )  «--*•  ( Ax  )  (Qx^Rx  )  ) 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1  1 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 


how  (Ex,  )  (P°<--»Q{  (x,  )  ) 

MEx,  )(PJ^Qj  (x,  ))  ASSUME 

(Ex  ,  )  (P$-»Q!  (x  ,  )  )  prem 

(P8-Q!(z,))  3, El 

(Ex  ,  )  (Q1  (x  ,  )-»PJ  )  PREM 

(Q! (z« )+P° )  5 , El 

(Ax,  )MPo  !  (  x  ,  )  )  2  ,  QN 

"•  (Po«-*Qi  (z  ,  )  )  7  ,  UI 

"•  ( P o *  (z,  )  )  7 ,  UI 

*show  ( P  l  +~+Q  ]  (z ,  )  ) 

*show  (QJ  (z  ,  )-*Pg ) 

Q ! ( z  ,  )  ASSUME 

*show  P£ 

■’PS  ASSUME 

■’Ql  (  z  8  )  14,6,  MT 

*show  ( P S +--*Q ]  (z  8  )  ) 

(Q{(zJ+P°)  6,R 

*show  (PS^QI  (zj) 

PI  ASSUME 

“'PS  1  4  p  R 

(P8+-+QI  (z,  )  )  18,  17, CB 

-’(PS«-*Q!  (z,  )  )  9 p R 


P  °o  *-*Q !  (z,  )  )  4,  1  1  ,CB 


time:  103  msec 
ements:  6651 


. 


*show  (Ex. )(P; (x, )8R|(x, )) 
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The  set  theoretic  examples  given  to  THINKER  were  discussed 
in  Chapter  VII,  Section  I.  Letting  P|  stand  for  "is  a  member 
of"  THINKER  proves  such  things  as:  there  is  no  "Russell 
set",  that  if  there  were  an  "anti  Russell  set"  then  not 
every  set  has  a  complement,  that  given  the  axiom  of 
separation  there  is  no  "universal  set",  and  that  there  is  no 
set  of  "non-circular  sets".  (25)  is  the  proof  of  the 
symmetry  of  set  identity  ( Q 1  stands  for  this  relation)  given 
the  definition  of  set  identity  in  terms  of  set  membership. 
This  is  the  problem  mentioned  by  de  Champeaux  (1979)  as  not 
being  solvable  by  his  system. 

21.  f-  -1  (Ey  )  ( Ax  )  ( Pxy<~»“,Pxx  ) 

22.  f-  (Ey)  (Ax)  ( Pxy«-+Pxx  )+-■  ( Ax )  (Ey)  (Az)  ( Pxy^-’Pzx  ) 

23.  \-  ( Az  )  (Ey )  ( Ax  )  ( Pxy«-+  ( Pxz&_,Pxx  )  )-*-*  (Ez  )  ( Ax  )  Pxz 

24.  ->  (Ey  )  (Ax  )  (  Pxy*--*-1  (Ez  )  (  Pxz&Pzx  )  ) 

25.  (Au)  (Aw)  (Quw4~+(Az)  (Pzu^Puw)  )  (-  ( Ax  )  ( Ay  )  (Qxy^+Qyx  ) 
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*show  “•(Ey,  )  (Ax,  )  (PI  (x,  ,yt  (x,  (x,  )  ) 


( Ey  ,  )  ( Ax  ,  )  ( P  |  ( x  !  f  y  ,  Pi  (x,,xt)) 

ASSUME 

(Ax,  )  (  PI  (  x  ,  ,  z  ,  )++-,P\  (  x  ,  ,  x  ,  )  ) 

2, El 

(  PI  (  Z  ,  ,  Z  ,  )*-+-'?  l  (  z  ,  ,  2  ,  )  ) 

3,UI 

(  PI  (  Z  ,  ,  z  ,  J-^Pi  (  Z  ,  ,  Z  ,  )  ) 

4,BC 

(■’PI  (z,,Z,  )-*-P|  (z,  tz, )  ) 

4  ,  BC 

*show  P 1 ( z  ,  ,  z  ,  ) 

■•Pi  (z , , z , ) 

ASSUME 

Pa  (z,  ,Z,  ) 

8,6,MP 

-•Pi  (z #  t z # ) 

5,7, MP 

CPU  time:  73  msec 
Statements:  3641 
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In  Chapter  VII,  Section  B,  three  different  versions  of  an 
equivalence  which  could  not  be  proved  by  Bledsoe's  systems 
were  mentioned.  They  are: 


26.  |-  (  (  (p&  (q+r  )  )-»s  )<--►  (  ( _ip+  (q+s  )  )  &  ( -*p+  (  t  +  s  )  )  )  ) 

27.  |-  (Ax  )  (  (Pa&  (Px^-Pb)  )-»Pc  ) 

(Ax  )  (  ( -’Pa+  (  (Px  +  Pc  )  )  &  ( _iPa+  (  ",Pb+Pc  )  )  ) 

28.  |-  (  ( Ax  )  (  ( Pa& (  Px-*  ( Ey )  ( Py&Rxy  )  )  )-►  ( Ez  )  ( Ew  )  ( Pz&Rxw&Rwz  )  ) 
( Ax  )  (  (  “,Pa+  ( Px+  ( Ez  )  ( Ew  )  (  Pz&Rxw&Rwz  )  )  )  & 

(  _,Pa+  (  “•  ( Ey  )  ( Py&Rxy  )  +  ( Ez  )  ( Ew  )  (  Pz&Rxw&Rwz  )  )  )  )  ) 

(The  line  numbering  on  the  proof  of  (28)  is  a  bit  peculiar 
because  lines  were  so  long  as  to  require  breaking  across 
boundries.  The  proof  line  number  ended  up  on  the  same  line 
of  the  page  as  the  annotation.  Thus  for  example  line  12  of 
the  proof  actually  starts  one  line  earlier  on  the  page  than 
the  line  number  would  seem  to  indicate.) 


. 


•5  J  :nvi8  JaUOW  »>n 


show  ( (  (  p&(q-r  )  )-*-s)«-*(  (  --p+  ( q+s )  )S(  -'p+  (  ->r  +  s )  ) )  ) 

*  show  (  (  (  -’p+tq+s)  )&(  ~-p+-(-T  +  s) )  )->(  ( p&  ( q-*r  )  )->s) ) 

((-’P+(q+s))&(-.p+(ir  +  s)))  ASSUME 

1  show  (  (p&(  q-*-r  )  )  +  s  ) 

J  (pS(q-^r))  ASSUME 
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Next  follows  proofs  relevant  to  the  discussions  in  Chapter 
VII  on  the  EI/UI  problem. 

29.  |-  (Ey ) ( Ax ) (Py^Px ) 

30.  I-  ( Ex  )  ( Ey )  ( Pxy-*  ( Ax  )  ( Ay  )  Pxy ) 

31.  |-  (Ex  )  ( Ay )  ( Az  )  (  ( Py-»Qz  )-►  ( Px+Qx  )  ) 

32.  |-  (  (Ax)  (Ay  )  (Ez)  (Aw)  (  ( Px&Qy ) + ( Rz&Sw )  ) 

(  (Ex)  (Ey  )  ( Px&Qy )  -^  ( Ez  )  Rz  )  ) 
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1  5 
16 


*show  ( Ey ,  )  (Ax,  )  ( P  ]  ( y  ,  )-*P  {  (  x  ,  )  ) 
“■  ( Ey ,  )  (Ax,  )  (P{  (y  ,  )->P]  (x,  )  ) 
(Ay,  )'i(Ax1  )  (P{  (y,  )^P]  (x,  )  ) 

■’  ( Ax  ,  )  (P{  (z,  )-»Pj  (x,  )  ) 

(Ex,  )-*(P{  (z,  )-*P{  (x,  )  ) 

-*(Pj  (z,  )+P\  ( z  8  )  ) 

*show  (P  j  (z  ,  )-*P]  (z8  )  ) 

P{ (z, ) 

*show  P ; ( z  8 ) 

^P\ (z, ) 

n(Ax,  ) (P{ (z, )+P]  (x,  )  ) 

(Ex,  )  ”•  ( P 1  (  z  8  )-*P{  (x,  )) 

“,(P|  (z,  )+P\  (z, )  ) 

*show  (PJ  (  z  8  )-»P !  ( z  7 )  ) 

P{ ( z 8  ) 

“’PI  ( z  8  ) 
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5, El 

ASSUME 

ASSUME 
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Finally  we  have 
which  was  fully 
six  subproblems 
proofs ) . 


the  six  subproblems  of  Andrew’s  Challenge, 
discussed  in  Chapter  VII,  Section  J.  (The 
are  what  are  on  line  1  of  the  following  six 
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X.  APPENDIX  II:  A  FLOW  CHART  OF  THINKER’S  PROOF  STRATEGY 

This  appendix  gives  an  informal  flow-chart  explanation 
of  the  proof  strategies  used  by  THINKER.  We  start  with  a 
discursive  explanation  of  ONESTEP  and  SIMPLEPROOF,  follow 
this  with  the  description  of  initialization,  and  then  go  on 
to  the  description  of  PROOF.  Certain  footnotes  to  various 
steps  in  PROOF  give  a  more  discursive  explanation  of  the 
steps . 

A.  ONESTEP  and  SIMPLEPROOF 

ONESTEP(0)  --  use  formula  0  to  prove  the  most  recent  goal 

[globally  given]  in  one  step.  This  looks  to  the  goal  and 
0,  and  decides  what  kind  of  formula  must  be  in  [the 
global]  ANTELINES  to  yield  a  proof  of  the  goal  using  0. 
If  it  finds  such  an  antecedent  line,  the  function 
introduces  the  pair  <goal , annotat ion>  into  ANTELINES  and 
returns  true.  (ONESTEP  might  introduce  complexity.  If 
the  goal  is,  for  example,  (P+Q)  and  0=P,  then  ONESTEP(P) 
introduces  <(P+Q),  line#  ADD>  into  ANTELINES,  where  the 
formula  (P+Q)  is  more  complex  than  the  one  from  which  it 
was  generated,  P.  But  the  complexity  is  only  "one  level 
higher"  than  0,  and  in  any  case  ONESTEP  always 
terminates  the  proof  at  a  given  show  level.) 

SIMPLEPROOF (0 )  —  find  a  very  simple  proof  of  0  from  the 
[global]  ANTELINES.  If  found,  it  introduces  the  pair 
<0 , annotat ion>  into  ANTELINES  and  returns  true. 
(SIMPLEPROOF  might  introduce  complexity.  If  0=(P+Q)  and 
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P  is  in  ANTELI NES ,  then  SIMPLEPROOF  will  introduce 
<(P+Q),  line#  ADD>  into  ANTELI NES ) .  Unlike  ONESTEP, 
SIMPLEPROOF  might  use  templates.  If  0=P,  SIMPLEPROOF 
will  look  for  (^P),  among  others,  and  if  found  see 
whether  (6>'s  token  is  also  in  ANTELI NES . 

Note  that  whenever  a  formula  is  added  to  ANTELINES ,  whether 
by  these  functions  or  others,  all  the  templates  are  also 
added,  and  the  variable  and  constant  tables  are  updated.  The 
reverse  (deletion)  is  done  whenever  a  line  is  removed  from 
ANTELINES.  The  relevant  functions  are  ADDANTE(0)  and 
DELANTE ( 0 ) . 
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B.  Initialization 

(I  here  consider  the  case  of  theorems  without  premises. 
Extra  things  happen  to  premises  which  would  only  obscure  the 
essential  points  being  outlined  here.) 

PRMAT  :=  NIL  {proof  table} 

GOALST  :=  NIL  {goal  stack} 

ANTELINES  :=  NIL  {antecedent  lines  table} 

CURLINE  :=  1  {global  current  line  in  PRMAT} 

IND  :=  false  {flag  for  indirect  proof} 

TRY  :=  false  {flag  for  CHAINING  strategies} 

CURLEVEL  :=  0  {global  current  depth  of  goal  stack} 

Initial  call  takes  the  form:48 
READ ( TO . PROVE ) ; 

N  :=  CURLINE; 

PROOF { TO . PROVE , N ) ; 


4  8Note  that  N  (the  line  that  this 
called  by  value.  At  the  end  of  an 
line  number  will  be  available  for 
step  1 1 . 


subproof  starts  on)  is 
embedded  subproof,  this 
boxing  and  cancelling.  See 
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C.  PROOF 

The  description  of  PROOF(0,n)  follows.  Formula  0  is  to 
be  proved,  and  its  line  number  in  PRMAT  is  n. 

A.  CURLINE  :=  CURLINE+1;  CURLEVEL  :=  CURLEVEL+ 1 ; 

B.  PUSH ( 0 , GOALST )  (add  0  to  the  goalstack} 

PRMAT[n,1]  :=  ’show’  0 

C.  if  SIMPLEPROOF ( 0 )  then  GOTO  11 

(SIMPLEPROOF  has  added  0  to  proof  matrix,  increments 
CURLINE;  step  11  wraps  up  proof} 

D.  (splitting  heuristics) 

a.  if  0=(0«— Mjj)  then 

i)  i  f  ( )  <[  GOALST  then 

a)  PROOF  (( iJi+0  ),  CURLINE) 

b)  if  ONESTEP  (  ( ili-*0  )  )  ,  GOTO  11 

ii)  if  (0-*  i|i)  {  GOALST  then  PROOF  (  (  0-mJi  )  ) 

iii) 

if  SIMPLEPROOF  ((  0^ili )  )  then  GOTO  II49 

b .  if  0=  ( 0& Ji )  then 

i)  if  0  {  GOALST  then 

a)  PROOF (0, CURLINE) 

b)  if  ONESTEP ( 0 )  then  GOTO  11 

ii)  if  iji  {  GOALST  then  PROOF  ( ill ,  CURLINE ) 

iii) 


if  SIMPLEPROOF  (  (  0&  i|i  )  )  then  GOTO  11 


49  SIMPLEPROOF  is  guaranteed  to  succeed  here  (and  in  other 
places  like  this  when  it  is  called)  because  the  previous  two 
steps  have  put  the  corresponding  conditionals  into 
ANTELINES,  so  that  SIMPLEPROOF  will  find  them  and  add  the  ««--»■ 
formula.  This  makes  it  possible  to  GOTO  11  and  wrap  up  the 
proof . 


■ 


5  0 


c.  if  0=  ( Aa )  it  and  "’FREE  ( a  )  and  i|i  \  GOALST  then 

i )  PROOF ( i , CURLINE ) 

i i )  GOTO  1 1 


50  FREE(a)  checks  whether  a  is  free  in  antecedent  lines. 
Recall  that  in  Kalish  &  Montague,  "universal  derivation" 
takes  the  place  of  universal  generalization.  I.e.,  if  a 
not  free  in  antecedent  lines  then 
show  ( Aa  )  iJj 
X, 

X 

can  be  boxed  of  di  occurs  unboxed  amongst  X,...X  and  ther 
are  no  uncancelled  Show's  amongst  them. 
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E.  (Assumptions:  recall  that  0  is  the  line  being  proved) 

a.  if  0  =  -* 4j  and  i|i  {  ANTELINES  then 

i)  PRMAT ( CURLINE ,  1  )  :=  iji ; 

ii)  PRMAT (CURLINE, 2)  :=  "ASSUME"; 
i  i  i  ) 

CURLINE  :=  CURLINE+1; 

iv)  ADDANTE  (  i|i  )  ; 

v)  if  ONESTEP ( t|i )  then  GOTO  11; 
vi  )  IND  : =  true ; 

b.  if  0=(iH6)  and  t|i  {  ANTELINES  then 

i)  if  SIMPLEPROOF ( 6 )  then  GOTO  11; 

ii)  PRMAT  (CURL  I NE,  1  )  :=  i|j  ; 

iii) 

PRMAT (CURLINE, 2)  :=  "ASSUME"; 

iv)  CURLINE  :=  CURLINE+1; 

v  )  ADDANTE  (  i|i  )  ; 

vi)  if  ONESTEP  (ill)  then  GOTO  11; 

vi  i  ) 

PROOF (6, CURLINE) ; 
vi  i  i  ) 

GOTO  1 1 ; 

c.  if  -0  j  ANTELINES  then 

i)  PRMAT (CURLINE, 1 )  :=  -0; 

ii)  PRMAT (CURLINE, 2)  :=  "ASSUME"; 

iii) 

CURLINE  :=  CURLINE+1; 


iv)  ADDANTE ( -0 ) ; 


v)  if  ONESTEP  ( ~|0 )  then  GOTO  11; 
vi  )  IND  : =  true ; 
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F.  (Forward  inference)51 

a.  OLDCUR  :=  CURLINE; 

b.  if  FINDQN  then  [if  ONESTEP ( PRMAT ( CURLINE , 1 ) ) 5 2  then 
GOTO  11  else  GOTO  6b  (more  QNs}] 

c.  if  FINDDN  then  [if  ONESTEP (PRMAT ( CURLI NE, 1 ) )  then 
GOTO  11  else  GOTO  6c  {more  DNs } ] 

d.  if  FINDBC  then  [if  ONESTEP ( PRMAT ( CURLI NE, 1 ) )  then 
GOTO  11  else  GOTO  6d  {more  BCs}] 

e.  if  FINDS  then  [if  ONESTEP ( PRMAT ( CURLI NE, 1 ) )  then 
GOTO  11  else  COTO  6e  {more  SIMPs}] 

f.  if  FINDMP  then  [if  ONESTEP ( PRMAT ( CURLINE , 1 ) )  then 
GOTO  11  else  GOTO  6f  {more  MPs}] 

g.  if  FINDMT  then  [if  ONESTEP ( PRMAT ( CURLINE , 1 ) )  then 
GOTO  11  else  GOTO  6g  {more  MTs}] 

h.  if  FINDMTP  then  [if  ONESTEP ( PRMAT ( CURLINE , 1 ) )  then 
GOTO  11  else  GOTO  6h  {more  MTPs}] 

i.  if  OLDCUR^CURLI NE  then  GOTO  6a 5 3 

j.  FINDALLEI  {Existentially  instantiate  all  lines  you 
can.  Mark  these  lines  as  being  Eled  under  this 
CURLEVEL.  Do  not  El  again  under  this  level.}54 


51  FINDQN  pushes  negations  to  the  inside  of  quantifiers.  It 
does  not  implement  the  reverse  type  of  QN.  FINDDN  deletes 
double  negation;  it  does  not  add  them.  Thus  all  the 
propositional  and  quantifier  negation  rules  ''simplify’'  (not 
add  complexity)  and  so  will  eventually  terminate. 

52This  will  be  a  ONESTEP  on  the  line  just  added  to  the  proof 
matrix  by  ONESTEP. 

53Each  time  one  of  the  FINDS  discovers  an  inference,  the 
line  is  added  to  PRMAT  with  an  appropriate  annotation  and 
CURLINE  is  incremented.  This  step  i  is  a  means  to  go  back  to 
try  the  FIND  rules  on  the  results  of  previous  FINDS. 

5  4FINDALLEI  will  only  work  once  on  a  given  formula  under  a 
given  CURLEVEL.  So  it  terminates. 
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k.  if  FINDUI  then  [if  ONESTEP ( PRMAT (CURLINE , 1 ) )  then 
GOTO  11  else  GOTO  6k  {more  UIs}]55 

l.  if  OLDCUR  *  CURLINE  then  GOTO  6a 5 6 


5  5 FI NDUI  is  instantiated  in  the  following  way:  (a)  FINDUI 
maintains  and  updates  A-lists,  (b)  An  El  might  put  a 
variable  on  the  P-list.  If  the  existentially  quantified  line 
has  a  universally  quantified  line  on  its  A-list,  then  that 
universally  quantified  line  is  marked  "bad".  No  "bad"  line 
can  be  Uled  to  any  variable  on  the  P-list  by  FINDUI. 

56So,  in  general,  the  FORWARD  INFERENCE  applies  all  the 
rules  of  inference  that  "simplify"  formulae.  This  section  is 
finite,  therefore,  and  will  eventually  terminate.  So  the  two 
ways  to  exit  FORWARD  INFERENCE  are:  (a)  Some  ONESTEP 
succeeds,  (b)  No  more  rules  can  be  applied  to  any  ANTELINES. 
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G.  (TRYNEG)  if  IND  then 

a.  if  -,(0-*ili)  e  ANTELINES  and  (0+i|i)  {  GOALST  then 

i)  PROOF  (  (  0-m|j  )  ,  CURLINE  ) 

ii)  if  ONESTEP  (  ( d» )  )  then  GOTO  11  57 

b.  if  -,(0^-*ili)  e  ANTELINES  and  (0<-*i|i)  \  GOALST  then 

i)  PROOF  (  (0<~m1i  ),  CURLINE) 

ii)  if  ONESTEP  (( 0^->tJj )  )  then  GOTO  11 

c.  if  “*  ( 0&  ili )  c  ANTELINES  and  ( 6  &  di )  {  GOALST  then 

i)  PROOF  (  (  0  &  i|i  )  ,  CURLINE ) 

ii)  if  ONESTEP  (  (0&tl) )  )  then  GOTO  11 

d.  if  -’(0  +  di)  c  ANTELINES  then 

i)  if  0  {  GOALST  then 

a)  PROOF ( 0 , CURL I NE ) ; 

b)  PRMAT ( CURLINE ,  1  )  :=  (0  +  di); 

c)  PRMAT ( CURLINE, 2)  :=  "Add" 

d)  CURLINE  :=  CURLINE+1; 

e)  if  ONESTEP  ((  0  + ili )  )  then  go  to  11 

ii)  else  if  t|i  <f  GOALST  then 

a)  PROOF  ( ili  ,  CURLINE  )  ; 

b)  PRMAT  (CURL  I  NE  ,  1  )  :=  (0  +  i|i); 

c)  PRMAT (CURLINE, 2)  :=  "Add" 

d)  CURLINE  :=  CURLINE+1; 

e)  if  ONESTEP  (  (  0  +  iJj  )  )  then  go  to  11 


5 7ONESTEP  in  this  and  the  following  must  succeed,  since 
there  must  be  an  explicit  contradiction  here.  I  put  it  this 
way  because  in  the  Kalish  &  Montague  system,  to  cancel  by  a 
contradiction,  both  halves  of  the  cont radi t ions  must  be 
"below"  the  line  cancelled.  If  this  is  not  otherwise  so, 
ONESTEP  will  Repeat  the  appropriate  line. 


' 
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H.  (Chaining) 


a.  if  (0-m1j)  e  ANTELINES  and  6  {  GOALST  and  i|i  { 
ANTELINES  then 

i)  PROOF (6,CURLINE) ; 

ii)  GOTO  6; 

b.  if  (e  +  ijj)  e  ANTELINES  then 

i)  if  i|i  j:  ANTELINES  and  --6  {  GOALST  then 

a)  PROOF  (  ■>  6  ,  CURL  I NE  )  ; 

b )  GOTO  6 ; 

ii)  if  0  <f  ANTELINES  and  -’ll;  {  GOALST  then 

a)  PROOF  (-  i|i  ,  CURL  I  NE)  ; 

b )  GOTO  6 ; 

I.  (UIPROHIB) 

a.  OLDCUR  :=  CURLINE; 

b.  UIPROHIBO; 

c.  if  OLDCUR^ CURLINE  then  GOTO  6  else  GOTO  10 


J.  (HELP) 

a  .  Print (PRMAT) ; 

b .  Read (X) ; 

c.  if  X  =  "Show”  i|j  then 

i  )  PROOF  (  i|i  ,  CURLINE)  ; 

ii)  GOTO  6; 

else 

i)  PRMAT  (CURL  I  NE,  1  )  :=  ill  ? 

ii)  PRMAT (CURLINE, 2)  :=  "GOD  says  so"; 

i  i  i ) 


CURLINE  :=  CURLINE+1; 


iv)  GOTO  6; 

(Wrap  up  proof ) 

a.  for  i:=n+1  until  CURLINE  do 

i )  DELANTE ( PRMAT ( i , 1 ) ) ; 

ii)  Prefix  ’|’  to  PRMAT ( i , 1 ) ; 

b .  ADDANTE ( PRMAT ( n , 1 ) ) ; 

c.  Prefix  to  PRMAT(n,1); 

d.  CURLEVEL  :=  CURLEVEL- 1 ; 
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